tesseract
tesseract copied to clipboard
TSV output splits each word by newline AND space
Basic Information
tesseract v5.3.0.20221222 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0
Windows
- [X] Windows 11
- [ ] Windows 10
Current Behavior
The string output of the file is correct,
[..]
Nutrition Facts 4
[..]
yet when selecting tsv output. Each word is placed on a newline.
5 1 1 1 1 1 48 0 562 323 76.177887 Nutrition
5 1 1 1 1 2 661 64 358 188 96.668480 Facts
5 1 1 1 1 3 1062 0 60 269 55.497231 4
Expected Behavior
To display the information similar to the string output.
Suggested Fix
Is there a way to omit/combine the items within the word_num column? Using psm did not have any effect
Other Information
No response
Please provide the input image.
Also provide tsv
and txt
output files. You can make a zip archive that will contain these files, so GitHub will let you upload them.
No feedback from the OP.