tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Wrong coordinates with .box when chi_tra_vert*.traineddata is used

Open MORzyuan opened this issue 4 years ago • 2 comments

Hi, many thanks to this fantastic work and all of you! I am here to report some wired situations about coordinates when chi_tra_vert_*.traineddata is used.

tesseract 4.1.0 leptonica-1.78.0 libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.1 Found AVX2 Found AVX Found SSE Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6

ProductName: Mac OS X ProductVersion: 10.13.6 BuildVersion: 17G65

  1. tesseract with makebox set all characters' X coordinates and their width to zero tesseract [--oem 1] chi_tra_vert_test_1.jpg chi_tra_vert_1_test -l chi_tra_vert makebox chi_tra_vert_test_1_result

  2. tesseract with lstmbox failed tesseract [--oem 1] chi_tra_vert_test_2.jpg chi_tra_vert_2_test -l chi_tra_vert lstmbox chi_tra_vert_test_2_result

And here are my questions:

  1. Why I got wrong coordinates?
  2. Why the OCR characters results are right while their coordinates are wrong?
  3. Though nothing related to the wired cases. Noticed that the vertical Chinese characters are only supported by 4.x versions, and 4.x versions only have the line-level bounding-boxs as their labeled data. How can the tesseract recognize the single character in the line?
  4. Noticed that there is not GPUs training method, it's a little disturbing to train a lstm-based nerual network with CPUs, any experience(datasets amount and the time cost, etc) would really help!

Best!

MORzyuan avatar Oct 01 '19 10:10 MORzyuan

tesseract -v
tesseract 5.0.0-alpha-20201231-111-ge1b9
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found NEON
 Found OpenMP 201511
 Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6 liblz4/1.7.1
 Found libcurl/7.58.0 NSS/3.35 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3

chi_tra

Cropped version of first image above and results of makebox, lstmbox and wordstrbox

(base) ubuntu@tesseract-ocr-1:~/TEST$ tesseract chi_tra.jpg chi_tra -l chi_tra_vert wordstrbox
Tesseract Open Source OCR Engine v5.0.0-alpha-20201231-111-ge1b9 with Leptonica
(base) ubuntu@tesseract-ocr-1:~/TEST$ tesseract chi_tra.png -  -l chi_tra_vert wordstrbox
WordStr 75 2 118 245 0 #保定 易 州 查 學
         119 2 123 245 0
WordStr 6 0 47 332 0 #前 半球 後 十 八 日 即
         48 0 52 332 0
(base) ubuntu@tesseract-ocr-1:~/TEST$ tesseract chi_tra.png -  -l chi_tra_vert lstmbox
保 75 2 123 245 0
定 75 2 123 245 0
  75 2 123 245 0
易 75 2 123 245 0
  75 2 123 245 0
州 75 2 123 245 0
  75 2 123 245 0
查 75 2 123 245 0
  75 2 123 245 0
學 75 2 123 245 0
         75 2 123 245 0
前 6 0 52 332 0
  6 0 52 332 0
半 6 0 52 332 0
球 6 0 52 332 0
  6 0 52 332 0
後 6 0 52 332 0
  6 0 52 332 0
十 6 0 52 332 0
  6 0 52 332 0
八 6 0 52 332 0
  6 0 52 332 0
日 6 0 52 332 0
  6 0 52 332 0
即 6 0 52 332 0
         6 0 52 332 0
(base) ubuntu@tesseract-ocr-1:~/TEST$ tesseract chi_tra.png -  -l chi_tra_vert makebox
保 0 76 0 118 0
定 0 75 0 116 0
易 0 66 0 122 0
州 0 66 0 122 0
查 0 66 0 122 0
學 0 66 0 122 0
前 0 9 0 47 0
半 0 6 0 47 0
球 0 8 0 47 0
後 0 0 0 51 0
十 0 0 0 51 0
八 0 0 0 51 0
日 0 0 0 51 0
即 0 0 0 51 0

Shreeshrii avatar Jan 12 '21 13:01 Shreeshrii

I ran into the same issue using the pytesseract wrapper.

tesseract -v:

tesseract 4.1.3
 leptonica-1.81.1
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.0) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2
 Found AVX
 Found SSE

pip freeze:

pytesseract==0.3.9

Input img:

img_0

Code:

import pytesseract
import cv2
from PIL import Image

img = cv2.imread("img.png")
img_conv = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img_pil  = Image.fromarray(img_conv)

height = img.shape[0]
width = img.shape[1]

d = pytesseract.image_to_boxes(img_pil,lang="chi_tra_vert",config=' -c tessedit_create_boxfile=1 --dpi 100 --tessdata-dir ./',output_type=pytesseract.Output.DICT)
print(d)
for i in range(0,len(d["left"])):
    (text,x1,y2,x2,y1) = (d['char'][i],d['left'][i],d['top'][i],d['right'][i],d['bottom'][i])
    cv2.rectangle(img, (x1,height-y1), (x2,height-y2) , (0,255,0), 2)
    cv2.imshow('img', img)
    cv2.waitKey(0)

output:

img_screenshot_26 07 2022

{'char': ['國', '之', '章', '、', '藍', '英', '國', '下', '繁', '始', '比', '坊', '和', '好', '疏', '策', '、', '不', '過', '對', '於', '新', '本'], 'left': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'bottom': [70, 70, 62, 62, 62, 70, 70, 62, 39, 37, 42, 30, 40, 40, 39, 7, 15, 11, 10, 9, 7, 0, 8], 'right': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'top': [93, 95, 99, 99, 99, 95, 95, 99, 60, 64, 61, 70, 60, 60, 60, 30, 26, 31, 33, 31, 31, 38, 32], 'page': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}

pytesseract.image_to_boxes uses makebox as well. Any idea why left and right end up being all-zero? Also, top and bottom coordinates are obviously incorrect as well. I tested the same code for an input image with English text and lang="eng" and it worked perfectly fine.

konstantinhenke avatar Jul 26 '22 10:07 konstantinhenke