Multi-column PDF cannot be segmented
Self Checks
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
1. Is this request related to a challenge you're experiencing? Tell me about your story.
Regarding the knowledge base document segmentation function, I found during use that when I uploaded documents in PDF format, multi-column PDFs often could not be recognized, but single-column ones had no problem. I would like to ask if DIFY has any relevant plans for the recognition and processing of multi-column PDFs.
2. Additional context or comments
The segmentation function of the knowledge base document, multi-column PDF content is often blank and invalid when segmenting, and it does not recognize any content. But often some files are indeed multi-column format, they are mostly documents of authoritative institutions or governments. I need the segmentation function to support multi-column PDF document format.
3. Can you help us with this feature?
- [ ] I am interested in contributing to this feature.
@JohnJyong Hello, I would like to inquire about the recognition of multi column PDF files. Do you have any plans on your end? PDF does not involve images or tables, of course, it would be even better if it could be supported.