infinity
infinity copied to clipboard
Add a TextSplitter in LangChain to share the model of the embedding model
Feature request
Have you ever though to add an API endpoint that can serve as well as TextSplitter ? It would replace the need to load in memory the same model for the text Chunker and the Embedder
https://python.langchain.com/docs/modules/data_connection/document_transformers/split_by_token#sentencetransformers
Motivation
Create a LangChain TextSplitter that is based on the model Tokenizer to chunk long documents
Your contribution
I feel ignorant about the AI domain (lack of knowledge)