extracting arabic characters

Open mirhmousavi opened this issue 8 years ago • 3 comments

pdf/reader doesn't correctly extract arabic characters, any suggestions?

Mar 05 '17 19:03 mirhmousavi

Unfortunately I don't speak Arabic and I don't have any sample PDFs, so I'm not familiar with the ways Arabic can be stored in a PDF.

If you can make a sample file with 1-2 lines of Arabic text, I'd be happy to take a quick look at it to try and understand how much effort would be involved in adding support.

Mar 05 '17 19:03 yob

take this for an example http://pajohesh.qomedu.ir/ar/100.pdf if it takes much effort i am glad to contribute and add this future with your help

Mar 05 '17 21:03 mirhmousavi

Here is the output I get when running pdf_text from pdf-reader 2.0.0 with the PDF you linked:

100.txt

Do you get something similiar? Can you help me understand what the issue with it is?

Mar 05 '17 22:03 yob