tesserocr icon indicating copy to clipboard operation
tesserocr copied to clipboard

`GetTextDirection` + `MapWordConfidences` crash python

Open NewUserHa opened this issue 2 years ago • 2 comments

import tesserocr
from PIL import Image
image = Image.new('RGB', (100, 100), 255)
with tesserocr.PyTessBaseAPI(r"C:\Program Files\Tesseract-OCR\tessdata", 'chi_sim', 10) as api:
    print(api.SetImage(image))
    print(api.GetTextDirection()) # comment this line out will make python not crash
    print(api.MapWordConfidences())

output:

None
(0, -0.0)
best_choice != nullptr:Error:Assert failed:in file C:\projects\tesserocr-windows-build\tesseract\src\ccmain\ltrresultiterator.cpp, line 51

additional: even if the image has a character in and GetUTF8Text() returned the result, the MapWordConfidences() still returns empty [] if there's a DetectOS() before it.

version: tesseract 5.3.1 leptonica-1.83.1 (Jun 13 2023, 19:19:21) [MSC v.1935 LIB Release x64] libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.4) : libpng 1.6.39 : libtiff 4.5.0 : zlib 1.2.13 : libwebp 1.3.0 : libopenjp2 2.5.0 python 3.11.2

NewUserHa avatar Oct 03 '23 15:10 NewUserHa

The error in your output is in tesseract not tesserocr but it's worth noting that Recognize() should be called before calling AllWords(), otherwise it'll always return an empty list (according to the method's docstring).

sirfz avatar Oct 04 '23 18:10 sirfz

adding api.Recognize() still the same.

import tesserocr
from PIL import Image
image = Image.new('RGB', (100, 100), 255)
with tesserocr.PyTessBaseAPI(r"C:\Program Files\Tesseract-OCR\tessdata", 'chi_sim', 10) as api:
    print(api.SetImage(image))
    api.Recognize()
    print(api.GetTextDirection())  # comment this line out will make python not crash
    print(api.MapWordConfidences())

and crash reason/error msg the same.

NewUserHa avatar Oct 04 '23 21:10 NewUserHa