rmast

Results 184 comments of rmast

I've done some testing with this branch merged to the current main. It's still not perfect. With oem 0: ![image](https://user-images.githubusercontent.com/3341558/184516308-20b1bdec-1e17-46b8-94b4-703f775626e7.png) With LSTM: ![image](https://user-images.githubusercontent.com/3341558/184516328-7588397c-aabb-40fc-b7c0-38c8979b5eda.png) Original image: ![out1496-3078-212-39](https://user-images.githubusercontent.com/3341558/184516477-5f62d74f-8cdb-434e-b808-ad39b90f21bc.png) Program code of bounding...

> Please remove the two unused code lines and fix the indentation of the remaining line. Done > ~Is that code only used when making boxes, or is it also...

@wollmers wrote > Sorry, mismatched this PR with [PR 3787](https://github.com/tesseract-ocr/tesseract/pull/3787). Yes, I first wasn't able to understand what my rotation fix had to do with your response, but as I...

What about using MRC compression to visually keep the file as much as the original but loosing lots of size as @jbarlow83 mentioned here: https://github.com/jbarlow83/OCRmyPDF/issues/836#issuecomment-922560147 > (We do not do...

@blaueente @v217 I saw your input in these issues concerning introducing MRC into OCRMyPDF: https://github.com/ocrmypdf/OCRmyPDF/issues/9 https://github.com/fritz-hh/OCRmyPDF/issues/88 I understand license-(in)compatibility is inhibiting progress. I was also looking into didjvu for understanding...

I tried your script on a newly arrived ABN AMRO-letter of two pages. The resulting out.pdf is 129 kb, and the letters ABN AMRO on top are quite vague. DjvuSolo...

> DjVu is a fun comparison but it has the advantage of being able to use image formats that are not supported in PDF. That's where DjVuToy comes in, that...

No, both are closed source. DjVuSolo3.1 is a very old pre-commercial demo of the capabilities of DjVu. When they commercialized DjVu they rated it at such high prices that DjVu...

Here the result via DjVuSolo3.1/DjVuToy3.06 unicode edition, half as small as your result from the Covid-health-form: [in.pdf](https://github.com/ocrmypdf/OCRmyPDF/files/8633543/in.pdf) ``` rmast@Ubuntu20:~$ pdfimages -list in.pdf page num type width height color comp bpc...

Especially take a look at the clearness of the background picture...