tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Tesseract misses whitelisted characters

Open eicksl opened this issue 3 years ago • 2 comments

Environment

  • Tesseract Version: 5.0.1

  • Platform: Windows 10 (64-bit)

Current Behavior:

The correct output is returned without whitelisting enabled, but an empty string is returned with whitelisting enabled.

Input image: foo

Command: tesseract foo.png out --oem 1 --psm 10

Output: "Q"

Command: tesseract foo.png out --oem 1 --psm 10 -c tessedit_char_whitelist=Q

Output: ""

Expected Behavior:

It is expected that whitelisting does not make it harder for Tesseract to detect permitted characters.

eicksl avatar Feb 22 '22 21:02 eicksl

CC: @bertsky

amitdo avatar Feb 23 '22 11:02 amitdo

Thanks for the perfect minimal example! Unfortunately, that one evades me.

I'm not sure anymore if b45999088ccbda19b57327c05810a8a015ce9a89 was correct in the first place (I remember there were even more pressing problems before), but reverting it does not help either, as https://github.com/bertsky/tesseract/tree/try-different-whitelisting shows.

bertsky avatar Feb 25 '22 00:02 bertsky