tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Non-linear grayscale normalization for layout analyse and/or text recognition

Open JKamlah opened this issue 2 years ago • 6 comments

Non-linear grayscale normalization

Draft PR only

I would first like to use the draft PR option to get some feedback that the implementation of grayscale normalization works smoothly on a wide variety of templates and is proving beneficial. Please test extensively.

Image normalization

In some cases, image normalization is applied to improve LA and OCR results. A popular method is called nlbin, which is a non-linear grayscale normalization with the option of subsequent binarization. This method was developed by Thomas Breuel for the text recognition program Ocropus.

In this PR the nlbin method was adapted for the existing Leptonica functions. The method can be activated via the parameter for layout analysis and/or the actual text recognition. It only performs a grayscale normalization and then the existing binarization methods can be still applied to it.

The "preprocess_graynorm_mode" parameter

This parameter is an INT member with currently 4 modes and can be activated with "-c preprocess_graynorm_mode=INT". The modes: 0=no normalization applied (default) 1= apply normalization for thresholding & recognition 2= apply normalization for thresholding (only) 3= apply normalization for recognition (only)

The modes 1-3 are applied on the fullimage. A normalization on linelevel would also be desirable. (not implemented yet)

Additional option

With the parameter "-c tessedit_write_images=1" the normalized image can be written out as tiff.

JKamlah avatar Jul 04 '22 13:07 JKamlah

Hi @JKamlah,

Leptonica has some built-in grayscale normalization functions, maybe we can also use them.

https://github.com/DanBloomberg/leptonica/blob/0ffbc6822c23725b5b9f6876e2620a22ba3689f4/src/adaptmap.c

Here are some examples that demonstrate how to use them to improve thresholding using Otsu's or Sauvola's methods:

https://github.com/DanBloomberg/leptonica/blob/1297942d8b5c1a76abdde93ab4bbd5472870b937/src/binarize.c

I suggest to try to add at least pixContrastNorm() so it can later be followed by Sauvola.

amitdo avatar Jul 06 '22 04:07 amitdo

CC: @bertsky

amitdo avatar Jul 06 '22 04:07 amitdo

You can use this image for testing the new feature:

https://github.com/DanBloomberg/leptonica/blob/a14036fa5f5ea971/prog/w91frag.jpg

amitdo avatar Jul 06 '22 04:07 amitdo

Thanks @amitdo and @bertsky for the great feedback. I will try to optimize the current implementation design and add an option to switch between non-linear normalization and pixContrastNorm(), maybe with a parameter preprocess_graynorm_method.

JKamlah avatar Jul 07 '22 08:07 JKamlah

Tesseract's Otsu is implemented in src/ccstruct/otsuthr.cpp and src/ccstruct/otsuthr.h.

I suggest to move ImageThresholder::pixNLNorm() to a separate .cpp file and also add a separate .h file.

amitdo avatar Apr 02 '23 11:04 amitdo

Thank you for the idea @amitdo. I am sorry for not responding for so long. I will get back to you in the coming weeks (not before Easter) with a revised version. Maybe it will fit into the next Tesseract release.

JKamlah avatar Apr 03 '23 08:04 JKamlah