llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

Chunking by paragraph

Open myisaak opened this issue 1 year ago • 2 comments

Is there a way to chunk by paragraph when creating a index?

If not, would this feature be considered as potentially viable to include inside this project? If so, I would be happy to contribute.

myisaak avatar Mar 18 '23 14:03 myisaak

Hey @MyIsaak, the current text splitting logic in LlamaIndex is fairly naive.

Currently, if you want to explicitly split by paragraphs, you can either use 1) unstructured.io https://llamahub.ai/l/file-unstructured or 2) a langchain text splitter and plug it into gpt index

We would love to have a contribution to have direct support in LlamaIndex. Should be very straightforward.

Disiok avatar Mar 18 '23 16:03 Disiok

Thanks for sharing the links. Not sure how unstructured.io could benefit from a text splitter. However, I noticed langchain has a class of text spliters with a well-defined interface. I'll open an issue on their repo.

myisaak avatar Mar 21 '23 17:03 myisaak