tesseract
tesseract copied to clipboard
Isolated character problem
Hi,
A lot of isolated characters are lost in the Version 4.1, and even with the latest version. May be they are relatively small (at 200 DPI).
Do you know where is the problem ? Segmentation or connected component filtering ?
Environment
Tesseract Version : 4.1 Window 10
Current Behavior:
Ouput text:
No. TAG page
07396
Expected Behavior:
No. TAG page 07396 1
Suggested Fix:
I'm also facing the same problem
Increasing the DPI doesn't help
In my case
if I highlight the word Serie:
It does work, but if I highlight "Serie" and the Letter "B"
It doesn't read anything