tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

tesseract CLI reports `Error during processing`. and first character is missing, while pytesseract works well

Open l1t1 opened this issue 1 year ago • 0 comments

Current Behavior

tesseract CLI

D:\>tesseract 7.jpg stdout -l chi_sim
Estimating resolution as 315
���联社电报 <-------------------------------here 4 characters

2023-03-17 20:52 星期五

【华为总裁任正非: 华为2022年研
发经费达238亿美元】

财联社3月17日电,华为总裁任正非日前在"难
题揭榜" 火花奖公司内外的获奖者及出题专家座
谈会上的讲话中透露,华为现在还属于困难时
期,但在前进的道路上并没有停步。2022年华
为的研发经费达238亿美元,未来在前沿探索上
还会继续加大投入。

TMT行业观察  华为最新动态

Error during processing. <-------------------------------here error

7

Expected Behavior

same as pytesseract

D:\>python
Python 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:34:34) [MSC v.1928 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pytesseract
>>> text = pytesseract.image_to_string('7.jpg', lang="chi_sim");print(text)
财联社电报  <-------------------------------here 5 characters 

2023-03-17 20:52 星期五

【华为总裁任正非: 华为2022年研
发经费达238亿美元】

财联社3月17日电,华为总裁任正非日前在"难
题揭榜" 火花奖公司内外的获奖者及出题专家座
谈会上的讲话中透露,华为现在还属于困难时
期,但在前进的道路上并没有停步。2022年华
为的研发经费达238亿美元,未来在前沿探索上
还会继续加大投入。

TMT行业观察  华为最新动态

Suggested Fix

tesseract CLI recognize all characters and do not report error.

tesseract -v

tesseract v5.3.0.20221214 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found SSE4.1 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0

Operating System

No response

Other Operating System

windows 7 32bit

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

l1t1 avatar Mar 18 '23 13:03 l1t1