llama_index
llama_index copied to clipboard
Chunking by paragraph
Is there a way to chunk by paragraph when creating a index?
If not, would this feature be considered as potentially viable to include inside this project? If so, I would be happy to contribute.
Hey @MyIsaak, the current text splitting logic in LlamaIndex is fairly naive.
Currently, if you want to explicitly split by paragraphs, you can either use 1) unstructured.io https://llamahub.ai/l/file-unstructured or 2) a langchain text splitter and plug it into gpt index
We would love to have a contribution to have direct support in LlamaIndex. Should be very straightforward.
Thanks for sharing the links. Not sure how unstructured.io could benefit from a text splitter. However, I noticed langchain has a class of text spliters with a well-defined interface. I'll open an issue on their repo.