[Bug]
📦 Platform
Other
📦 Deploymenet mode
server db(lobe-chat-database image)
📌 Version
Latest
💻 Operating System
Windows
🌐 Browser
Chrome
🐛 Bug Description
When I chunk files it is not chunk by semantic blocks. Every time I got equal length chunks.
But recursive text splitter or md splitter should split by headers or lines.
Maybe I miss some settings but I can’t found anything like this.
📷 Recurrence Steps
No response
🚦 Expected Behavior
No response
📝 Additional Information
No response
👀 @dab512
Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible.
Please make sure you have given us as much context as possible.
非常感谢您提交 issue。我们会尽快调查此事,并尽快回复您。 请确保您已经提供了尽可能多的背景信息。
The issue you're experiencing with file chunking not splitting by semantic blocks might be related to the current limitations or settings in the lobe-chat framework. As of now, there isn't specific documentation or configuration settings available in the repository that control how files are chunked by semantic blocks or headers. The default behavior seems to result in equal-length chunks, which might not align with your expectations for semantic or header-based splitting.
There is a known limitation where the chatbot processes only a limited portion of chunks, which might affect how the data is handled [1]. Additionally, the logic for different file types and their loaders is implemented in the ChunkingLoader class, but specific settings for semantic block chunking are not detailed [2].
If you are looking for specific settings or configurations, they might not be explicitly defined in the current version. You might need to explore the codebase further or wait for future updates that address these limitations.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other