rmast

Results 29 issues of rmast

https://github.com/tesseract-ocr/tesseract/issues/3871

Despite all effort in the inverted text [my good quality proof-picture](https://user-images.githubusercontent.com/3341558/179373903-ef6cc246-f4e5-4633-a762-ded4dd22708f.jpg) is still not correctly segmented and read. The clear to be read for a human sentence near the bottom,...

layout analysis

The newest version I found still can't cope with lines that have been split up by different rules of the max. linesize, like The mail system <xxxxxxxxxxxxxxxxxxxxxx>: maildir delivery failed:...

The test with the following statement shows one epoch on the screen, and then seems to keep the GPU and CPU running whithout any further progress on the screen for...

![175789293-f39ddfdb-6f3e-4598-8d16-80a1f4a88b36](https://user-images.githubusercontent.com/3341558/179575323-234cebaf-bce1-41f3-ace1-9543b6e9e088.jpg) results with defaults 0,NULL,NULL in ![Plaatje achteraf](https://user-images.githubusercontent.com/3341558/179575577-84d89298-3d36-4e40-9f9c-a1f0558381a6.png) The KVK on top left should be left as it was, or at least the bottom half of the V should be...

### Bug description When I make the detector detect text in the following image ![Brief gemeente 300dpi voorkant](https://github.com/mindee/doctr/assets/3341558/221e61a3-5a2a-4812-847c-b8afe55f31d1) the preferred dots in the boxes are from the i's in the...

type: bug

This link tells me ort-inference supports OpenVino: https://github.com/pytorch/ort#-inference "ONNX Runtime for PyTorch supports PyTorch model inference using ONNX Runtime and Intel® OpenVINO™. It is available via the torch-ort-infer python package....

This form [https://www.kvk.nl/download/Formulier-14-wijziging-ondernemings-en-vestigingsgegevens_tcm109-365607.pdf](url) First page saved to jpeg via this site: https://smallpdf.com ![0001](https://user-images.githubusercontent.com/3341558/175789293-f39ddfdb-6f3e-4598-8d16-80a1f4a88b36.jpg) Result of the left column is quite readable at the right screen-resolution. ``` ocrmypdf --pdfa-image-compression lossless -O0...

I solved issue https://github.com/internetarchive/archive-pdf-tools/issues/52 myself.

Partly anonymized replay of my previous finding on compressing the bankstatement with downsampling the foreground, revealing a bug in the foreground-binarizer/separator. ![image](https://user-images.githubusercontent.com/3341558/175774964-5861e95d-0f2d-414e-97b2-8216400cd219.png) Add fg_downsample=12 in compress-pdf-images: mrc_gen = create_mrc_hocr_components(pil_image, hocr_word_data,...