ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Bug]: The size of PDF chunks is not limited, which can lead to extremely large chunks that exceed the limitations of the LLM in extreme cases.

Open chinamerp opened this issue 1 year ago • 2 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Branch name

main

Commit ID

4cda40c

Other environment information

No response

Actual behavior

chunk limited to 256

Expected behavior

No response

Steps to reproduce

upload and parse a pdf with tabl and picture.

Additional information

No response

chinamerp avatar May 23 '24 11:05 chinamerp

What is the chunking method you refer to? If you're not using 'One', you can control chunk size limitation by task page size here: image

It limits the page size for one task.

KevinHuSh avatar May 24 '24 00:05 KevinHuSh

A PDF file is parsed and splited to chunks, each chunk's size should be limited.

chinamerp avatar May 24 '24 01:05 chinamerp

A PDF file is parsed and splited to chunks, each chunk's size should be limited.

I see and agree.

KevinHuSh avatar May 28 '24 02:05 KevinHuSh

Each chunk of a pdf is limited to 12 pages by default.

yuzhichang avatar Nov 26 '24 09:11 yuzhichang