tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Error in boxClipToRectangle: box outside rectangle

Open PedroBarcha opened this issue 7 years ago • 13 comments

Hi there, I've got some specific images that output the following on linux:

Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

The pictures get successfully OCRed in tesseract (without great results tho). The biggest problem for me, however, is that in OCRopus they don't even get OCRed.

example5 ghoby30c

Any ideas?

PedroBarcha avatar Sep 14 '16 14:09 PedroBarcha

Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

Add a white/black frame to the image and no error messages will appear.

convert  427-1.jpg  -bordercolor White -border 10x10 427-1b.jpg

Strange behaviour...

amitdo avatar Sep 19 '16 22:09 amitdo

The biggest problem for me, however, is that in OCRopus they don't even get OCRed.

This place is for bug reports about Tesseract, not OCRopus.

amitdo avatar Sep 19 '16 23:09 amitdo

@amitdo I'm getting the same issue just with Tesseract. I'm guessing OCRopus is using Tesseract and that's why he made the issue here.

erikdubbelboer avatar Jun 27 '17 09:06 erikdubbelboer

I'm guessing OCRopus is using Tesseract

Ocropy (and clstm) does not use Tesseract. A VERY OLD version of Ocropus (0.4) did use Tesseract.

amitdo avatar Jun 27 '17 12:06 amitdo

Similar issues #468 #1601

These error messages are produced by Leptonica.

They are triggered by a call to pixClipBoxToForeground()

https://github.com/DanBloomberg/leptonica/blob/bbe289cf3f0fe368d5b9eac64df2ccd6e9b05c56/src/pix5.c#L1956

https://github.com/tesseract-ocr/tesseract/search?q=pixClipBoxToForeground

amitdo avatar Jul 08 '20 15:07 amitdo

@stweil, this seems like a bug in Tesseract, maybe you can explore it and find its cause.

amitdo avatar Jul 08 '20 15:07 amitdo

https://github.com/tesseract-ocr/tesseract/search?q=pixClipBoxToForeground

I noticed that Tesseract does not check the return value from Leptonica's functions (l_ok).

amitdo avatar Jul 08 '20 15:07 amitdo

@stweil, this seems like a bug in Tesseract, maybe you can explore it and find its cause.

It's caused by a box with width / height 0, but as always in Tesseract it is difficult to find the right fix.

stweil avatar Jul 09 '20 10:07 stweil

This error is still present, tried to read an image of 250x50,and got the error..
after a few trials, I found that 250x51 is working, so apparently there's a limit for the smallest size of image

Nemesis77swe avatar May 26 '22 15:05 Nemesis77swe

I have the same issue. I have a software that fetches images via wget and then runs ocr with tesseract on them. I noticed that with some images (or resolutions like I found out) the following error occurs:

Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

I found out that this only occurs at some resolutions. So I wrote a script to check this on an example image. This script decreases successively the resolution of the image and then tries to apply ocr to it with tesseract. The image has a resolution of 2090x1504 pixel.

There are no errors till the height reaches 1578 pixels. Than irregulary some errors and from 1502p nearly for every image. Some images generate several of these errors, eg:

h: 1094
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

Unlike @Nemesis77swe ,

there's a limit for the smallest size of image

I don't think that there is a limit, I think it's maybe a mathematical issue somewhere in the code which causes a box with width / height of 0 like @stweil stated.

I attached the script and the output and this is the image.


Platform:

Linux notebook63 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Tesseract Version:

tesseract 5.2.0-13-g74e22
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found AVX512BW
 Found AVX512F
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
 Found libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3

csidirop avatar Aug 11 '22 12:08 csidirop

I tried this on an other windows machine in wsl with same results:

Ubuntu 20.04 (on both win machines) and Debian buster showing exact the same outputs.

csidirop avatar Aug 15 '22 10:08 csidirop

@csidirop,

Does adding a white or black border to the image help?

https://github.com/tesseract-ocr/tesseract/issues/427#issuecomment-248153491

If not, post an image that demonstrate the issue.

amitdo avatar Aug 15 '22 12:08 amitdo

Indeed, there are no errors after adding a white border

csidirop avatar Aug 15 '22 12:08 csidirop