tesseract
tesseract copied to clipboard
Non-linear grayscale normalization for layout analyse and/or text recognition
Non-linear grayscale normalization
Draft PR only
I would first like to use the draft PR option to get some feedback that the implementation of grayscale normalization works smoothly on a wide variety of templates and is proving beneficial. Please test extensively.
Image normalization
In some cases, image normalization is applied to improve LA and OCR results. A popular method is called nlbin, which is a non-linear grayscale normalization with the option of subsequent binarization. This method was developed by Thomas Breuel for the text recognition program Ocropus.
In this PR the nlbin method was adapted for the existing Leptonica functions. The method can be activated via the parameter for layout analysis and/or the actual text recognition. It only performs a grayscale normalization and then the existing binarization methods can be still applied to it.
The "preprocess_graynorm_mode" parameter
This parameter is an INT member with currently 4 modes and can be activated with "-c preprocess_graynorm_mode=INT". The modes: 0=no normalization applied (default) 1= apply normalization for thresholding & recognition 2= apply normalization for thresholding (only) 3= apply normalization for recognition (only)
The modes 1-3 are applied on the fullimage. A normalization on linelevel would also be desirable. (not implemented yet)
Additional option
With the parameter "-c tessedit_write_images=1" the normalized image can be written out as tiff.
Hi @JKamlah,
Leptonica has some built-in grayscale normalization functions, maybe we can also use them.
https://github.com/DanBloomberg/leptonica/blob/0ffbc6822c23725b5b9f6876e2620a22ba3689f4/src/adaptmap.c
Here are some examples that demonstrate how to use them to improve thresholding using Otsu's or Sauvola's methods:
https://github.com/DanBloomberg/leptonica/blob/1297942d8b5c1a76abdde93ab4bbd5472870b937/src/binarize.c
I suggest to try to add at least pixContrastNorm()
so it can later be followed by Sauvola.
CC: @bertsky
You can use this image for testing the new feature:
https://github.com/DanBloomberg/leptonica/blob/a14036fa5f5ea971/prog/w91frag.jpg
Thanks @amitdo and @bertsky for the great feedback.
I will try to optimize the current implementation design and add an option to switch between non-linear normalization
and pixContrastNorm()
, maybe with a parameter preprocess_graynorm_method
.
Tesseract's Otsu is implemented in src/ccstruct/otsuthr.cpp
and src/ccstruct/otsuthr.h
.
I suggest to move ImageThresholder::pixNLNorm()
to a separate .cpp
file and also add a separate .h
file.
Thank you for the idea @amitdo. I am sorry for not responding for so long. I will get back to you in the coming weeks (not before Easter) with a revised version. Maybe it will fit into the next Tesseract release.