gpt4all
gpt4all copied to clipboard

Published 20 hours ago •

Reame
Issues

Provide new chunking strategies in localdocs

Open manyoso opened this issue 1 year ago • 4 comments

Currently we do a character/word based chunking that is very simple. We should enhance our chunking strategies to possibly include:

Recursive Character Chunking
Token Based Chunking
Document Specific Chunking (HTML, MD, Python, CPP, etc)
Semantic Chunking

Here is some possible literature:

https://research.trychroma.com/evaluating-chunking
https://www.sagacify.com/news/a-guide-to-chunking-strategies-for-retrieval-augmented-generation-rag
https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d

Jul 10 '24 14:07 manyoso

Labels

enhancement

chat

local-docs

Owner

Other Repo Issues