langchain icon indicating copy to clipboard operation
langchain copied to clipboard

experimental: add max_chunk_size to SemanticChunker

Open RafaelXokito opened this issue 1 year ago • 4 comments

Description: This PR adds a max_chunk_size parameter to the SemanticChunker class. The max_chunk_size ensures that no chunk exceeds the specified size, splitting larger chunks accordingly. This feature enhances the chunking process by maintaining manageable chunk sizes.

Issue: Fixes #18014

RafaelXokito avatar Jul 17 '24 11:07 RafaelXokito

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Jul 17, 2024 2:07pm

vercel[bot] avatar Jul 17 '24 11:07 vercel[bot]

It will be really helpful if this feature gets added. For now, the Semantic Chunker returns very large chunks in some cases and the only way to limit that is creating a custom class I guess?

SatyamMattoo avatar Sep 04 '24 11:09 SatyamMattoo

Can we please get this in @hwchase17 ? thanks!

gustavz avatar Sep 05 '24 13:09 gustavz

Waiting for this feature to be implemented.

repoofideas avatar Sep 10 '24 18:09 repoofideas

closing and feel free to reopen against the langchain-experimental repo (this package moved)! https://github.com/langchain-ai/langchain-experimental

regarding max_chunk_size, wouldn't it be more effective to just pass the output of semanticchunker to one of the other text splitters that follows a strict chunk-size strategy? that way the user can decide which strategy to use to keep the chunks below a given size! this pr would be prescriptive if I understand correctly

efriis avatar Sep 26 '24 02:09 efriis