rmast

Results 184 comments of rmast

If I invert the complete image via [https://pinetools.com/invert-image-colors](url) and repeat the steps all text seems correct in tesseract and sharp in the resulting PDF, despite both inverted and non-inverted text...

I found a workaround to get the OCR correct: Create a file tess.cfg containing ``` tessedit_do_invert True ``` And call ``` ocrmypdf -l nld 175789293-f39ddfdb-6f3e-4598-8d16-80a1f4a88b36.jpg --tesseract-config tess.cfg ocrkwaliteit.pdf ``` The...

[The new parameter Stefan Weil suggests](https://github.com/tesseract-ocr/tesseract/pull/3141) gives the same error.

When I look at the extracted hocr from this "array"-containing PDF it twice contains the "wis-clear" part on the right top of the image, unfortunately both with confidence 100. I...

You can already see these are separately recognized words, for example the third coordinate of the first 'w' differs from the second. But Stefan says this is not by design,...

I didn't get the print/wis-clear correctly read in automatically in plain Tesseract. Looking around for a solution I stumbled into [EasyOCR](https://github.com/JaidedAI/EasyOCR), which doesn't have HOCR-output, but comes with something similar...

Playing around with the new You.com YouChat, which is free to use at the moment you can ask questions which are answered ChatGPT-like, but including references and actual results from...

If you could recognize the font and it's a freely available font then you could replace the invisible text by the visible font and remove the jb2. One of the...

The second commit is for solving this error: https://github.com/internetarchive/archive-pdf-tools/issues/55#issuecomment-1166449630

> btw, I think I fixed this in [3c20a46](https://github.com/internetarchive/archive-pdf-tools/commit/3c20a464f53ca0524268e35b998036d18b380b45) - can you confirm? Without resetting up and retesting it I read through the issues to see what we were trying...