Doxa icon indicating copy to clipboard operation
Doxa copied to clipboard

[Feature request] AdOtsu, adaptive Otsu's method

Open Dobatymo opened this issue 2 years ago • 9 comments

Hi, thank you for this great library. There is another interesting method called AdOtsu which is an adaptive version inspired by the Otsu method. See paper AdOtsu: An adaptive and parameterless generalization of Otsu's method for document image binarization Would it be possible to implement this one? I know this is a time consuming task, so feel free to ignore it.

Dobatymo avatar Sep 07 '21 06:09 Dobatymo

Thank you for the request. I'll look into it. I have looked over the paper before... it is popular. I am always taken back by how many research papers reference an algorithm that has never been implemented except privately by the original authors. That makes it a great candidate for this library. I had originally dropped trying to implement it after getting burned by another grid based algorithm (https://github.com/brandonmpetty/Doxa/blob/master/Doxa/Bataineh.hpp)

My time updating this project is probably going to be reduced for a while in addition to the fact that I have been preparing to get a working version of Howe's algorithm out. Not exactly Adaptive, but since it has represented the SOTA for so long I thought it might be good to have.

I am going to leave this open and if anyone would like to take a stab at it, we can put it in a branch and fully flush it out. If anyone else is interested in AdOtsu, please comment here. I'll try to look over it again in the next few weeks and I'll update this thread.

brandonmpetty avatar Sep 09 '21 16:09 brandonmpetty

@Dobatymo, I have been looking into AdOtsu. The authors actually have offered up two implementations of AdOtsu in two different papers. They also offer up a "grid based multi-scale" algorithm that works for any adaptive algorithm, in addition to the actual AdOtsu implementation.

I am wrapping up the 2010 non-grid based AdOtsu now, and will look into the 2011 version you referenced in the link above. I will then try to add in the grid based multi-scale aspect. There is a lot going on there, so it may take some time.

brandonmpetty avatar May 17 '22 01:05 brandonmpetty

@Dobatymo, I know it's been a while but I have decided to push the AdOtsu code I did about a year ago into #35.

I had forgotten how far I actually got with it, but it is looking really good. Great results. I have the base AdOtsu algorithm implemented along with the Multi-Scaling algorithm The only downside is that I have not implemented the Grid system which AdOtsu calls for to speed things up. That said, adding it will only make things faster, but it will not help the results (if anything it may hurt them slightly).

Seeing as how this is the only known implementation of AdOtsu I could find, it is incredibly difficult to judge the accuracy of the implementation. Once I get the Grid component knocked out, I try to get it all confirmed if possible.

Till then, what do you think?

image [AdOtsu /w Multi-Scale, k = 1, MinAvg Grayscale Algorithm]

brandonmpetty avatar Sep 20 '23 05:09 brandonmpetty

Could it be there is even simpler simplification for the method?

If ϴ is the indicator function and T_l is the local threshold, then:

ϴ * (k * (1 - T_l) - k * T_l) + k * T_l = ϴ * k * (1 - 2 * T_l) + T_l

Where I used the fact that the Otsu threshold of an image u and 1 - u are: T_u _ T_1-u = 1.

Am I missing something?

RoyiAvital avatar Sep 22 '23 15:09 RoyiAvital

@RoyiAvital, The way I am interpreting the paper is that Moghaddam is taking the delta between the Global and Local, and if that number is within a small range it is to use the Local threshold but if it is out of range, it is to be set as background. Out of the box the range is within 25 points on a grayscale image with values from 0 to 255, but that range can be controlled through k and R'. I am not seeing that represented in your formula.

I think you are right that I could simplify things more.

This is what I currently have:

const double u = (std::abs((double)globalThreshold - k * localThreshold) / R) - 255;

return (u < 0) ? localThreshold : -1;

But I could do better:

const double u = (std::abs((double)globalThreshold - k * localThreshold) / R);

return (u < 255) ? localThreshold : -1;

Can it get any simpler than that? Note that I am working with values between 0 and 255 not 0 and 1. If I am missing something, let me know. I would understand it better if you posted a code snippet or PR representing your thought. The code is in AdOtsu.hpp. I have a test that will write out a sample file (just note that the color-to-grayscale formula is set to Qt instead of the recommended MinAvg specified in the paper; BinarizationTests.cpp).

brandonmpetty avatar Sep 23 '23 06:09 brandonmpetty

@brandonmpetty , In my simplification ϴ includes its input variables. I only simplified things in the other term: (k * (1 - T_l) - k * T_l) + k * T_l. In your range it would be: (k * (255 - T_l) - k * T_l) + k * T_l.

In the words of the article it says about background but specifically it just applies a different threshold (The complimentary). I haven't found the reference implementation, so it is a gray area.

RoyiAvital avatar Sep 23 '23 13:09 RoyiAvital

Hi, just a heads-up that the bindings for AdOtsu are missing. Noticed that when trying to use it from Python. Thanks for the addition of a new algorithm!

I'm currently testing and found that the algo chops away chunks of text, like so:

image

This seems to happen for smaller "islands of text"? Not sure. The other algorithms perform fine in this case. I'm wondering how the 3 parameters window, k and R need to be tweaked to make it perform better for such documents (photos taken with smartphone camera).

heinrich-ulbricht avatar Dec 19 '23 20:12 heinrich-ulbricht

Thanks for the feedback @heinrich-ulbricht. I was having trouble reproducing the problem. Could you provide an example input image?

I just checked in a very minor algorithm change that might have an effect when playing around with the K value. It is still in Draft, so I am still working on it, getting feedback, etc. I need to implement the paper's Grid optimization that uses interpolation to cut down on the number of calculations this thing has to run. I was hoping to get that in before I started the Bindings.

brandonmpetty avatar Dec 30 '23 05:12 brandonmpetty

I did some testing with an image of 2112x2982 pixel, cycling k and R from 0.1 to 2 in 0.1 increments. I tested window sizes of 75, 101, 125 and 151.

Here's a sample with w=101, k=1.0, R=0.1 (k and R being the defaults currently baked into AdOtsu.hpp).

image

Here's the same region with w=101, k=1.0, R=0.3.

image

I don't understand what k and R mean. I see the algorithm, but cannot infer the consequences of changing the values.

From my tests, I see there is kind of a "turning point" for k and R where it doesn't matter anymore which values they have. Make one of them too low and the image is white. Make them high and changes don't matter anymore.

The "turning points" seem to be in the range of (all with a window size of 101 and comparable documents):

  • k=2.0, R=0.1..1.1
  • k=1.8, R=0.1..0.9
  • k=1.6, R=0.1..0.6
  • k=1.4, R=0.1..0.5
  • k=1.2, R=0.1..0.4
  • k=1.0, R=0.1..0.3
  • k=0.8, R=0.1..0.4
  • k=0.6, R=0.1..0.4
  • k=0.4, R=0.3..0.5
  • k=0.1, R=0.5..0.7

(all based on the 3cfaea6e commit; and yes, it took ages to generate all of those)

The higher k, the more R can be used to "finetune" the result (it seems). But at a certain point, there are no differences anymore.

I settled for window=75, k=2.0, R=1.0, which does not "eat" text and provides a good base for OCR (which is my main goal).

AdOtsu seems to perform really good for documents with a lot of shades. For me those are the exceptions where it outperforms the other algorithms. But apart from those cases, the other algos perform better.

And in areas without any text AdOtsu (or the implementation snapshot I used) produces a lot of noise like this:

image

This drives up the file size when storing images. But it nevertheless is excellent in preserving text and thin lines, when using the right parameters.

That's my holiday experiments so far :D

(I did not yet look at your recent change. All of above might change, but so be it.)

heinrich-ulbricht avatar Dec 30 '23 11:12 heinrich-ulbricht