tesseract
tesseract copied to clipboard
Tesseract misses whitelisted characters
Environment
-
Tesseract Version: 5.0.1
-
Platform: Windows 10 (64-bit)
Current Behavior:
The correct output is returned without whitelisting enabled, but an empty string is returned with whitelisting enabled.
Input image:

Command: tesseract foo.png out --oem 1 --psm 10
Output: "Q"
Command: tesseract foo.png out --oem 1 --psm 10 -c tessedit_char_whitelist=Q
Output: ""
Expected Behavior:
It is expected that whitelisting does not make it harder for Tesseract to detect permitted characters.
CC: @bertsky
Thanks for the perfect minimal example! Unfortunately, that one evades me.
I'm not sure anymore if b45999088ccbda19b57327c05810a8a015ce9a89 was correct in the first place (I remember there were even more pressing problems before), but reverting it does not help either, as https://github.com/bertsky/tesseract/tree/try-different-whitelisting shows.