ragflow
ragflow copied to clipboard
[Feature Request]: Support Arabic PDF's
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
Is your feature request related to a problem?
I’m working with Arabic PDFs, and when parsing them using Deepdoc Parser with the paper chunking method, the extracted text appears in LTR (Left to Right) instead of the correct RTL (Right to Left) order used in Arabic. This results in scrambled text and unreadable chunks.
Describe the feature you'd like
Feature to Support Arabic PDF's
Describe implementation you've considered
No response
Documentation, adoption, use case
Additional information
No response
The information retrieval of Arabic language requires dedicated design, no matter full text search or embedding can not perform well given standard implementation. Do you have any plan or insights on the Arabic dedicated IR ?