tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

TSV output splits each word by newline AND space

Open Antsthebul opened this issue 2 years ago • 1 comments

Basic Information

tesseract v5.3.0.20221222 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0

Windows

  • [X] Windows 11
  • [ ] Windows 10

Current Behavior

The string output of the file is correct,

[..]
Nutrition Facts 4
[..]

yet when selecting tsv output. Each word is placed on a newline.

5       1       1       1       1       1       48      0       562     323     76.177887       Nutrition
5       1       1       1       1       2       661     64      358     188     96.668480       Facts
5       1       1       1       1       3       1062    0       60      269     55.497231       4

Expected Behavior

To display the information similar to the string output.

Suggested Fix

Is there a way to omit/combine the items within the word_num column? Using psm did not have any effect

Other Information

No response

Antsthebul avatar Dec 26 '22 21:12 Antsthebul

Please provide the input image.

Also provide tsv and txt output files. You can make a zip archive that will contain these files, so GitHub will let you upload them.

amitdo avatar Dec 27 '22 09:12 amitdo

No feedback from the OP.

amitdo avatar Jan 13 '23 08:01 amitdo