# 17.Towhee介绍
Towhee (opens new window)是一个流式处理框架,基于各种算子,把各种非结构化数据进行数据处理。非结构化数据主要包含音频、图片、视频、文本。下边是官方英文介绍:
Towhee is a cutting-edge framework designed to streamline the processing of unstructured data through the use of Large Language Model (LLM) based pipeline orchestration. It is uniquely positioned to extract invaluable insights from diverse unstructured data types, including lengthy text, images, audio and video files. Leveraging the capabilities of generative AI and the SOTA deep learning models, Towhee is capable of transforming this unprocessed data into specific formats such as text, image, or embeddings. These can then be efficiently loaded into an appropriate storage system like a vector database. Developers can initially build an intuitive data processing pipeline prototype with user friendly Pythonic APU, then optimize it for production environments.
几个核心功能:
1. LLM 管道编排 Towhee 具有灵活性,可以适应不同的大语言模型(LLM)。此外,它允许在本地托管开源大模型。此外,Towhee 提供了prompt管理和知识检索等功能,使与这些 LLM 的交互更加高效和有效。
2. 各种算子 [Operators](https://towhee.io/tasks/operator)
3. 数据处理 API(DataCollection): DataCollection API 是用于描述流水线的编程接口。提供多种数据转换接口:map, filter, flat_map, concat, window, time_window以及window_all,通过这些接口,可以快速构建复杂的数据处理管道,处理视频,音频,文本,图像等非结构化数据。
- 一个计算文本相似度的例子
#sentence
from towhee import pipe, ops, DataCollection
#默认会从HF下载sentence-transformers/all-MiniLM-L6-v2。国内下载可能一直不成功,大家可以去Gitee上找下地址,提前下载到本地。
p = (pipe.input('text')
.map('text', 'vec',
ops.sentence_embedding.transformers(model_name='/home/models/sentence-transformers/all-MiniLM-L6-v2'))
.output('text', 'vec')
)
DataCollection(p('Hello, world.')).show()
DataCollection(p('Hello')).show()
print(p('Hello').get())
2
3
4
5
6
7
8
9
10
11
12
13
引用