[Feature Request]: Support Arabic PDF's

Open karthik-v-b opened this issue 2 weeks ago • 1 comments

Self Checks

[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (Language Policy).
[x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
[x] Please do not modify this template :) and fill in all the required fields.

Is your feature request related to a problem?

I’m working with Arabic PDFs, and when parsing them using Deepdoc Parser with the paper chunking method, the extracted text appears in LTR (Left to Right) instead of the correct RTL (Right to Left) order used in Arabic. This results in scrambled text and unreadable chunks.

Describe the feature you'd like

Feature to Support Arabic PDF's

Describe implementation you've considered

No response

Documentation, adoption, use case

Additional information

No response

Dec 09 '25 06:12 karthik-v-b

The information retrieval of Arabic language requires dedicated design, no matter full text search or embedding can not perform well given standard implementation. Do you have any plan or insights on the Arabic dedicated IR ?

Dec 09 '25 08:12 yingfeng