tesseract
tesseract copied to clipboard
tesseract CLI reports `Error during processing`. and first character is missing, while pytesseract works well
Current Behavior
tesseract CLI
D:\>tesseract 7.jpg stdout -l chi_sim
Estimating resolution as 315
���联社电报 <-------------------------------here 4 characters
2023-03-17 20:52 星期五
【华为总裁任正非: 华为2022年研
发经费达238亿美元】
财联社3月17日电,华为总裁任正非日前在"难
题揭榜" 火花奖公司内外的获奖者及出题专家座
谈会上的讲话中透露,华为现在还属于困难时
期,但在前进的道路上并没有停步。2022年华
为的研发经费达238亿美元,未来在前沿探索上
还会继续加大投入。
TMT行业观察 华为最新动态
Error during processing. <-------------------------------here error
Expected Behavior
same as pytesseract
D:\>python
Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:34:34) [MSC v.1928 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pytesseract
>>> text = pytesseract.image_to_string('7.jpg', lang="chi_sim");print(text)
财联社电报 <-------------------------------here 5 characters
2023-03-17 20:52 星期五
【华为总裁任正非: 华为2022年研
发经费达238亿美元】
财联社3月17日电,华为总裁任正非日前在"难
题揭榜" 火花奖公司内外的获奖者及出题专家座
谈会上的讲话中透露,华为现在还属于困难时
期,但在前进的道路上并没有停步。2022年华
为的研发经费达238亿美元,未来在前沿探索上
还会继续加大投入。
TMT行业观察 华为最新动态
Suggested Fix
tesseract CLI recognize all characters and do not report error.
tesseract -v
tesseract v5.3.0.20221214 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found SSE4.1 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0
Operating System
No response
Other Operating System
windows 7 32bit
uname -a
No response
Compiler
No response
CPU
No response
Virtualization / Containers
No response
Other Information
No response