langchain
langchain copied to clipboard
experimental: add max_chunk_size to SemanticChunker
Description:
This PR adds a max_chunk_size parameter to the SemanticChunker class. The max_chunk_size ensures that no chunk exceeds the specified size, splitting larger chunks accordingly. This feature enhances the chunking process by maintaining manageable chunk sizes.
Issue: Fixes #18014
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Skipped Deployment
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| langchain | ⬜️ Ignored (Inspect) | Visit Preview | Jul 17, 2024 2:07pm |
It will be really helpful if this feature gets added. For now, the Semantic Chunker returns very large chunks in some cases and the only way to limit that is creating a custom class I guess?
Can we please get this in @hwchase17 ? thanks!
Waiting for this feature to be implemented.
closing and feel free to reopen against the langchain-experimental repo (this package moved)! https://github.com/langchain-ai/langchain-experimental
regarding max_chunk_size, wouldn't it be more effective to just pass the output of semanticchunker to one of the other text splitters that follows a strict chunk-size strategy? that way the user can decide which strategy to use to keep the chunks below a given size! this pr would be prescriptive if I understand correctly