Amit Dovev comments

Results 538 comments of


                                            Amit Dovev

Arabic language (right to left in writing) stored (left to right) after create PDF Searchable

@tbadran > But please note that words are not reversed while viewing the PDF because it contains the original image with text layer. > I mean when you copy text...

Arabic language (right to left in writing) stored (left to right) after create PDF Searchable

@roozgar You can try training Tesseract using the regular engine. Use the the wiki and see #169. I really don't know how good the result will be for Arabic. Like...

Arabic language (right to left in writing) stored (left to right) after create PDF Searchable

Tom, Look at the original jpg. Lines 2 and 4 in Google Chrome look quite similar to lines 2 and 3 in the original jpg. First word in line 3...

Arabic language (right to left in writing) stored (left to right) after create PDF Searchable

Again, in Google Chromium. If I mark the first two lines in the PDF + first word in line 3, copy the (invisible) text, paste it to a text file,...

Arabic language (right to left in writing) stored (left to right) after create PDF Searchable

@jbreiden I didn't understand you. In one comment you talk about Hebrew and in another one you only referring Arabic. Does Hebrew displayed correctly with Adobe Reader?

Arabic language (right to left in writing) stored (left to right) after create PDF Searchable

Please make sure that any change you do is not causing any regression with Chrome PDF viewer and OS X Preview. Thanks for your work!

Arabic language (right to left in writing) stored (left to right) after create PDF Searchable

Maybe explicitly using unicode bidi control characters can help ?

Arabic language (right to left in writing) stored (left to right) after create PDF Searchable

@jbreiden, any progress? Which way you chose? Personally, I care about our Hebrew support.

Non-linear grayscale normalization for layout analyse and/or text recognition

Hi @JKamlah, Leptonica has some built-in grayscale normalization functions, maybe we can also use them. https://github.com/DanBloomberg/leptonica/blob/0ffbc6822c23725b5b9f6876e2620a22ba3689f4/src/adaptmap.c Here are some examples that demonstrate how to use them to improve thresholding using...

Non-linear grayscale normalization for layout analyse and/or text recognition

CC: @bertsky