Pluggable splitters
🚀 The feature
I notice that all the loaders hard-code their splitters. Would it be possible to make this an injectable so you can customise them per use case and/or roll your own?
Basically something like:
await app.addLoader(new WebLoader({
urlOrContent: 'https://www.forbes.com/profile/elon-musk',
splitter: SemanticSplitter
}));
Or something of the sorts 🤷
Motivation, pitch
Chunking/Splitting is one of the most important parts of the RAG process as this directly effects the quality of embeddings and retrieval. There are several common approaches which perform slightly better depending on the source document. There is also a lot of research in this space so making this something that can be customised would a big win for the community - especially given that the majority of solutions to this problem are seriously lacking (in the JS world).
This issue is stale because it has been open for 14 days with no activity.
This issue was closed because it has been inactive for 30 days since being marked as stale.
@alexborisov Can you reopen the issue? I'm getting started and have the same question.
Sorry about the delay in getting back. This should be available in the next release. I will keep you posted.
This issue is stale because it has been open for 14 days with no activity.