rmast
rmast
I'm not convinced it's inversion related. I think it already comes from somewhere where segments are propagated into each other, probably searching underlines. If I run this statement wis-clear is...
By the way, this one is compiled without legacy, so it's in the new parts
``` Testing underline on blob at (2149,3149)->(2396,3189), base=3160 Occs:247 247 247 Testing underline on blob at (2149,3103)->(2396,3144), base=3085 Occs:0 0 247 Underlined blob at:Bounding box=(2149,3103)->(2396,3144) Was:Bounding box=(2149,3103)->(2396,3144) Segmenting baseline of...
I've now pinpointed the disappearing upper boundingbox from Block1 textord.cppBlock 28Bounding box=(2149,3103)->(2396,3189) **Bounding box=(2149,3149)->(2396,3189)** Bounding box=(2149,3103)->(2396,3144) Bounding box=(2194,3114)->(2237,3137) Bounding box=(2249,3121)->(2257,3125) Bounding box=(2269,3114)->(2336,3137) /Block as disappearing in textord.cpp // Remove empties....
This might be involved: B:28 R:1 -- Can't do isolated row stats. B:28 R:1 -- Inadequate certain spaces. tesseract -c textord_restore_underlines=1 --dpi 300 -l Latin -c textord_noise_rejrows=0 -c textord_debug_block=28 -c...
Please let us know if you find an open source automatic segmenter that generally and unattendedly does a better job than Tesseract itself. I guess that would be a hit....
>  > > With this image, I get an empty output with all available eng/Latin models. That's interesting! That makes focussing on the issue easier. I've run it with...
./migneuzn ~/175789293-f39ddfdb-6f3e-4598-8d16-80a1f4a88b36.jpg > ~/175789293-f39ddfdb-6f3e-4598-8d16-80a1f4a88b36.uzn tesseract ~/175789293-f39ddfdb-6f3e-4598-8d16-80a1f4a88b36.jpg - --psm 4 Gives the same error: ``` [> wis - clear | wis - clear ``` So that's also a possibility to focus...
> I made my own cut-out of that image, nearly the original block 28 and there was no issue at all recognizing the text correctly: Unfortunately cutting out the picture...
I tried EasyOCR as segmenter. Using the segments as UZN on the image or the inverted image doesn't make a difference. I still tend to dive into the error(s) despite...