gpt4all
gpt4all copied to clipboard
Provide new chunking strategies in localdocs
Currently we do a character/word based chunking that is very simple. We should enhance our chunking strategies to possibly include:
- Recursive Character Chunking
- Token Based Chunking
- Document Specific Chunking (HTML, MD, Python, CPP, etc)
- Semantic Chunking
Here is some possible literature:
- https://research.trychroma.com/evaluating-chunking
- https://www.sagacify.com/news/a-guide-to-chunking-strategies-for-retrieval-augmented-generation-rag
- https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d