Add a TextSplitter in LangChain to share the model of the embedding model

Open Jimmy-Newtron opened this issue 1 year ago • 0 comments

Feature request

Have you ever though to add an API endpoint that can serve as well as TextSplitter ? It would replace the need to load in memory the same model for the text Chunker and the Embedder

https://python.langchain.com/docs/modules/data_connection/document_transformers/split_by_token#sentencetransformers

Motivation

Create a LangChain TextSplitter that is based on the model Tokenizer to chunk long documents

Your contribution

I feel ignorant about the AI domain (lack of knowledge)

Apr 04 '24 13:04 Jimmy-Newtron