dify icon indicating copy to clipboard operation
dify copied to clipboard

Knowledge segmentation cannot set the maximum length of segments.

Open xiangzongliang opened this issue 8 months ago • 6 comments

Self Checks

  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.3.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

1、Knowledge segmentation cannot set the maximum segment length, and an error is prompted when clicking on the preview. Image

2、Clicking here to increase and decrease does not respond.

Image

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

xiangzongliang avatar Apr 29 '25 04:04 xiangzongliang

I found some similar threads that might be relevant to your issue:

  1. Maximum chunk length setting can't control the size of the chunk - This discussion is open and addresses issues with controlling chunk size in the knowledge segmentation feature [1].

  2. How to modify the maximum length of knowledge base document segmentation? - This is another open discussion that explores how to adjust the maximum segment length [2].

  3. Question about segment length - This thread is open and discusses segment length settings [3].

Regarding your issue, the maximum segment length in Dify is controlled by the INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH configuration setting. By default, the maximum segment length is set to 500 tokens, but it can be configured up to 4000 tokens. To change this setting, you need to modify the .env file by setting INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH=4000 and then restart Docker Compose [4][5].

If the UI controls for adjusting segment length are unresponsive, it might be related to configuration settings or a bug in the interface. Unfortunately, I couldn't find specific documentation on troubleshooting UI responsiveness issues, but ensuring that your configuration settings are correctly applied and restarting the application might help resolve the problem.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Apr 29 '25 04:04 dosubot[bot]

I have set INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH to 4000, but after restarting Docker, it seems that this parameter is not effective.

# Maximum length of segmentation tokens for indexing 
INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH=4000

xiangzongliang avatar Apr 29 '25 05:04 xiangzongliang

The INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH parameter in the Dify codebase is constrained by a system-level enforcement that limits the segment length to a maximum of 1000 tokens. Even if you set a higher value in the .env file, the system will not apply it because the valid range for segment length is between 50 and 1000 tokens [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Apr 29 '25 05:04 dosubot[bot]

Similar issue when to import document into kb through parent-child mode, dify raises error "分段最大长度不能大于1000". The error disappears after user changes the maximum tokens in the general mode as well as in parent-child mode, but may still occur when add a new document into the existing kb(if parent-child mode is chosen, user can't modify maximum tokens in general mode this time). it seems dify by default validates maximum tokens limit even if the general mode is not going to be used

nili1988 avatar May 08 '25 08:05 nili1988

How was this final issue handled? I have been experiencing this problem since upgrading to 1.3.1

RamyHuang avatar May 18 '25 06:05 RamyHuang

I think this bug is caused by the maximum segment length is "hard-code" defined by the maximum segment length of general mode. Therefore, when you upload a new file to an existing knowledge, for instance, parent-child mode. There is no option for you to set general mode maximum segment length. And the default length of general mode is 1024.

I had this issue too.

I created a new knowledge and first use parent-child mode with 1000 maximum segment length got the same issue.

Image

Then I switch to general mode set maximum segment length to 1000, it works.

Image

Next I switch back to parent-child mode with the same 1000 maximum segment length, now it works.

Image

joyatcloudfall avatar May 20 '25 03:05 joyatcloudfall

这个bug怎么样了,好像到1.4.2都还没解决啊

tianlingchen avatar Jun 12 '25 06:06 tianlingchen

检查一下 .env 文件中的 INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH 配置,如果是 1000 的话,需要改大些

samanhappy avatar Jun 12 '25 07:06 samanhappy