tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Recognize japanese symbols in two screenshots

Open superbonaci opened this issue 1 year ago • 4 comments

Current Behavior

Recognize the symbols.

Expected Behavior

Recognize the symbols in these two screenshots. Original pictures from Dragon Ball episode 1:

goku1

goku2

After some perspective correction (maybe helps?):

goku1-ed

goku2-ed

Suggested Fix

Recognize the symbols.

tesseract -v

tesseract 5.3.2 leptonica-1.82.0 libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.5.1) : libpng 1.6.40 : libtiff 4.5.1 : zlib 1.2.11 : libwebp 1.3.1 : libopenjp2 2.5.0 Found NEON Found libarchive 3.6.2 zlib/1.2.11 liblzma/5.4.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.4 Found libcurl/7.88.1 SecureTransport (LibreSSL/3.3.6) zlib/1.2.11 nghttp2/1.51.0

Operating System

macOS 13 Ventura

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

superbonaci avatar Jul 19 '23 09:07 superbonaci

Tesseract's layout analysis was designed to deal with simple layouts of books, magazines, newspapers and documents.

For any image that Tesseract completely fails to recognize, or fails to recognize some areas in the image, it is recommended to use a different tool to clean the image for Tesseract and make it easier for Tesseract to recognize text.

https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html

In your case, you should give Tesseract just the letters without the frame around them.

amitdo avatar Mar 16 '24 22:03 amitdo

No result either with the improved picture:

result

% tesseract -l jpn result.png result.txt
Empty page!!
Empty page!!
% tesseract -l script/Japanese result.png result.txt
Empty page!!
Empty page!!

superbonaci avatar Mar 17 '24 12:03 superbonaci

Did you try with different psm values?

amitdo avatar Mar 17 '24 12:03 amitdo

Still no luck, but Google Lens finds it fine: https://ja.wikipedia.org/wiki/%E5%80%92%E7%A6%8F

superbonaci avatar Mar 17 '24 15:03 superbonaci