Knowledge segmentation cannot set the maximum length of segments.
Self Checks
- [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [x] Please do not modify this template :) and fill in all the required fields.
Dify version
1.3.1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
1、Knowledge segmentation cannot set the maximum segment length, and an error is prompted when clicking on the preview.
2、Clicking here to increase and decrease does not respond.
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
I found some similar threads that might be relevant to your issue:
-
Maximum chunk length setting can't control the size of the chunk - This discussion is open and addresses issues with controlling chunk size in the knowledge segmentation feature [1].
-
How to modify the maximum length of knowledge base document segmentation? - This is another open discussion that explores how to adjust the maximum segment length [2].
-
Question about segment length - This thread is open and discusses segment length settings [3].
Regarding your issue, the maximum segment length in Dify is controlled by the INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH configuration setting. By default, the maximum segment length is set to 500 tokens, but it can be configured up to 4000 tokens. To change this setting, you need to modify the .env file by setting INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH=4000 and then restart Docker Compose [4][5].
If the UI controls for adjusting segment length are unresponsive, it might be related to configuration settings or a bug in the interface. Unfortunately, I couldn't find specific documentation on troubleshooting UI responsiveness issues, but ensuring that your configuration settings are correctly applied and restarting the application might help resolve the problem.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
I have set INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH to 4000, but after restarting Docker, it seems that this parameter is not effective.
# Maximum length of segmentation tokens for indexing
INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH=4000
The INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH parameter in the Dify codebase is constrained by a system-level enforcement that limits the segment length to a maximum of 1000 tokens. Even if you set a higher value in the .env file, the system will not apply it because the valid range for segment length is between 50 and 1000 tokens [1].
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Similar issue when to import document into kb through parent-child mode, dify raises error "分段最大长度不能大于1000". The error disappears after user changes the maximum tokens in the general mode as well as in parent-child mode, but may still occur when add a new document into the existing kb(if parent-child mode is chosen, user can't modify maximum tokens in general mode this time). it seems dify by default validates maximum tokens limit even if the general mode is not going to be used
How was this final issue handled? I have been experiencing this problem since upgrading to 1.3.1
I think this bug is caused by the maximum segment length is "hard-code" defined by the maximum segment length of general mode. Therefore, when you upload a new file to an existing knowledge, for instance, parent-child mode. There is no option for you to set general mode maximum segment length. And the default length of general mode is 1024.
I had this issue too.
I created a new knowledge and first use parent-child mode with 1000 maximum segment length got the same issue.
Then I switch to general mode set maximum segment length to 1000, it works.
Next I switch back to parent-child mode with the same 1000 maximum segment length, now it works.
这个bug怎么样了,好像到1.4.2都还没解决啊
检查一下 .env 文件中的 INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH 配置,如果是 1000 的话,需要改大些