tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Detect text rotation without running recognition

Open Balearica opened this issue 2 years ago • 2 comments

As noted in the documentation , Tesseract performs poorly when the page is at an angle (not a multiple of 90 degrees). This limitation is not problematic from an accuracy standpoint, as Tesseract accurately reports the angle of text lines, so my existing pipeline rotates and re-runs recognition on any image where the angle is significant. However, this is computationally inefficient as there does not appear to be any way to get the page angle without also running recognition (despite estimating page angle/gradient being one of the first things calculated).

Therefore, it would be of significant benefit to be able to get the page angle without running the entire recognition process. I'll work on a build that does this myself--my initial thought is to add a config option that tells Tesseract to report the page angle and quit early (before recognition) if median line angle is above a user-defined threshold, however let me know if others have thoughts on implementation.

Balearica avatar Jun 07 '22 03:06 Balearica

For such image prerocessing I would suggest to have a look at the leptonica programs/function examples) flipdetect_reg ,skewtest, skew_reg, and maybe dewarptest2...

Of course there are limitations (see e.g. issue 622), but they are fast and reliable for most of my cases...

IMHO such prepossessing should be done outside of tesseract.

zdenop avatar Jun 07 '22 05:06 zdenop

Thanks for your response, I will review the Leptonica scripts linked before deciding how to implement.

Balearica avatar Jun 08 '22 05:06 Balearica

I found a much, must faster solution to detect page rotation. Call SetImage followed by DetectOrientationScript and then call

Pix *rotated = pixRotateOrth(pix, (360 - degree) / 90);

However, there is currently a bug that causes this to fail randomly so you need my short patch from https://github.com/tesseract-ocr/tesseract/issues/4062

todd-richmond avatar Apr 27 '23 05:04 todd-richmond

It is here: https://github.com/DanBloomberg/leptonica/blob/0ffbc6822c23725b5b9f6876e2620a22ba3689f4/src/rotateorth.c#L64

zdenop avatar Apr 27 '23 05:04 zdenop

https://github.com/DanBloomberg/leptonica/blob/0ffbc6822c23725b5b9f6876e2620a22ba3689f4/src/rotateorth.c#L64

That is the API to rotate an image, but not the API to detect if it is rotated. Tesseract docs and some StackOverflow comments recommend Recognize(), but that is extremely slow. On a sample tiff I used, it took .9 seconds for DetectOrientationScript vs 2.1 seconds for Recognize - when both were followed by 90 rotation and another Recognize to extra text

todd-richmond avatar Apr 27 '23 16:04 todd-richmond

@todd-richmond, you are talking about orientation detection: 0 / 90 / 180 / 270 degrees.

@Balearica is talking about a page with some parts that are skewed

amitdo avatar Apr 28 '23 08:04 amitdo

Never mind. I missed the "not" 90 when reading. De-skewing is much more challenging so we haven't bothered dealing with that for now

todd-richmond avatar Apr 28 '23 16:04 todd-richmond

@Balearica,

Did you try using AnalyseLayout()?

https://github.com/tesseract-ocr/tesseract/blob/bf7c134ba6958f2efdaace2fbeba31cad91394ce/include/tesseract/baseapi.h#L433-L449

amitdo avatar May 07 '23 09:05 amitdo

@amitdo I did not end up implementing this way, but do believe that running AnalyseLayout and then using the lines to re-calculate the average gradient would be another way to go about this.

I ended up creating a branch that allows for retrieving the number Tesseract already calculates, which I pushed to #4070. I think this is the most direct approach, and the only approach that does not involve redundant calculations.

Balearica avatar May 09 '23 04:05 Balearica