Sandro Mani
Sandro Mani
Thank you for working on this! I haven't yet looked in depth at flatpack, so currently I'm not really able to partecipate, but if there are specific issues, I'm happy...
I fear this is a general issue with PoDoFo and complex scripts - resp more work is needed have PoDoFo handle these correctly.
Actually, isn't it just a matter of picking the right font? I tried with a test image you sent me a while ago, installed the Lohit Devanagari font, selected that...
Ah I see. Do you have any idea how tesseract handles this?
Yeah I read the same thread - as I read it, PoDoFo isn't capable of handling it for you, but it should be possible to handle it with custom code...
But looking at the tesseract source, in particular [pdfrenderer.cpp](https://github.com/tesseract-ocr/tesseract/blob/master/api/pdfrenderer.cpp), I see no traces of pango or harfbuzz. It would be sufficient to figure out the low-level blocks that tesseract adds...
Okay I'll take a look when I find a moment.
@Shreeshrii I've added a QPrinter backend for PDF export, please give it a try.
Here you go: - 32 bit: https://smani.fedorapeople.org/tmp/gImageReader_3.2.3_qt5_i686.exe - 64 bit: https://smani.fedorapeople.org/tmp/gImageReader_3.2.3_qt5_x86_64.exe
1. Correct, hOCR is always page based (due to the nature of the hOCR format). While clearly a subset of a document can also be seen as a hOCR page,...