tesserocr
tesserocr copied to clipboard
`GetTextDirection` + `MapWordConfidences` crash python
import tesserocr
from PIL import Image
image = Image.new('RGB', (100, 100), 255)
with tesserocr.PyTessBaseAPI(r"C:\Program Files\Tesseract-OCR\tessdata", 'chi_sim', 10) as api:
print(api.SetImage(image))
print(api.GetTextDirection()) # comment this line out will make python not crash
print(api.MapWordConfidences())
output:
None
(0, -0.0)
best_choice != nullptr:Error:Assert failed:in file C:\projects\tesserocr-windows-build\tesseract\src\ccmain\ltrresultiterator.cpp, line 51
additional:
even if the image has a character in and GetUTF8Text() returned the result, the MapWordConfidences() still returns empty [] if there's a DetectOS() before it.
version: tesseract 5.3.1 leptonica-1.83.1 (Jun 13 2023, 19:19:21) [MSC v.1935 LIB Release x64] libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.4) : libpng 1.6.39 : libtiff 4.5.0 : zlib 1.2.13 : libwebp 1.3.0 : libopenjp2 2.5.0 python 3.11.2
The error in your output is in tesseract not tesserocr but it's worth noting that Recognize() should be called before calling AllWords(), otherwise it'll always return an empty list (according to the method's docstring).
adding api.Recognize() still the same.
import tesserocr
from PIL import Image
image = Image.new('RGB', (100, 100), 255)
with tesserocr.PyTessBaseAPI(r"C:\Program Files\Tesseract-OCR\tessdata", 'chi_sim', 10) as api:
print(api.SetImage(image))
api.Recognize()
print(api.GetTextDirection()) # comment this line out will make python not crash
print(api.MapWordConfidences())
and crash reason/error msg the same.