tesseract
tesseract copied to clipboard
Recognize japanese symbols in two screenshots
Current Behavior
Recognize the symbols.
Expected Behavior
Recognize the symbols in these two screenshots. Original pictures from Dragon Ball episode 1:
After some perspective correction (maybe helps?):
Suggested Fix
Recognize the symbols.
tesseract -v
tesseract 5.3.2 leptonica-1.82.0 libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.5.1) : libpng 1.6.40 : libtiff 4.5.1 : zlib 1.2.11 : libwebp 1.3.1 : libopenjp2 2.5.0 Found NEON Found libarchive 3.6.2 zlib/1.2.11 liblzma/5.4.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.4 Found libcurl/7.88.1 SecureTransport (LibreSSL/3.3.6) zlib/1.2.11 nghttp2/1.51.0
Operating System
macOS 13 Ventura
Other Operating System
No response
uname -a
No response
Compiler
No response
CPU
No response
Virtualization / Containers
No response
Other Information
No response
Tesseract's layout analysis was designed to deal with simple layouts of books, magazines, newspapers and documents.
For any image that Tesseract completely fails to recognize, or fails to recognize some areas in the image, it is recommended to use a different tool to clean the image for Tesseract and make it easier for Tesseract to recognize text.
https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html
In your case, you should give Tesseract just the letters without the frame around them.
No result either with the improved picture:
% tesseract -l jpn result.png result.txt
Empty page!!
Empty page!!
% tesseract -l script/Japanese result.png result.txt
Empty page!!
Empty page!!
Did you try with different psm values?
Still no luck, but Google Lens finds it fine: https://ja.wikipedia.org/wiki/%E5%80%92%E7%A6%8F