rmast
rmast
https://github.com/tesseract-ocr/tesseract/issues/3871
Despite all effort in the inverted text [my good quality proof-picture](https://user-images.githubusercontent.com/3341558/179373903-ef6cc246-f4e5-4633-a762-ded4dd22708f.jpg) is still not correctly segmented and read. The clear to be read for a human sentence near the bottom,...
The newest version I found still can't cope with lines that have been split up by different rules of the max. linesize, like The mail system <xxxxxxxxxxxxxxxxxxxxxx>: maildir delivery failed:...
The test with the following statement shows one epoch on the screen, and then seems to keep the GPU and CPU running whithout any further progress on the screen for...
 results with defaults 0,NULL,NULL in  The KVK on top left should be left as it was, or at least the bottom half of the V should be...
### Bug description When I make the detector detect text in the following image  the preferred dots in the boxes are from the i's in the...
This link tells me ort-inference supports OpenVino: https://github.com/pytorch/ort#-inference "ONNX Runtime for PyTorch supports PyTorch model inference using ONNX Runtime and Intel® OpenVINO™. It is available via the torch-ort-infer python package....
This form [https://www.kvk.nl/download/Formulier-14-wijziging-ondernemings-en-vestigingsgegevens_tcm109-365607.pdf](url) First page saved to jpeg via this site: https://smallpdf.com  Result of the left column is quite readable at the right screen-resolution. ``` ocrmypdf --pdfa-image-compression lossless -O0...
I solved issue https://github.com/internetarchive/archive-pdf-tools/issues/52 myself.
Partly anonymized replay of my previous finding on compressing the bankstatement with downsampling the foreground, revealing a bug in the foreground-binarizer/separator.  Add fg_downsample=12 in compress-pdf-images: mrc_gen = create_mrc_hocr_components(pil_image, hocr_word_data,...