Circular thresholding (e.g. for hue or orientation features)
Description:
Every once in a while, I have the requirement for a thresholding algo which works on a circular feature (such as hue in some color spaces or orientation angle).
Here's a list of algos which do exactly that: https://en.wikipedia.org/wiki/Circular_thresholding (in particular Lai2014)
If you are interested, I could give it a shot.
The first author even provides source code (no license given). I could ask him if he's ok with including his algo as a Python rewrite.
Hey @pohlt, welcome! :) Thanks for the suggestion and especially the context around it.
At a first glance this seems like a good fit. From our review guide
In general, we are looking to include algorithms and methods which are established, well documented in the literature and widely used by the imaging community. While this is not a hard requirement, new contributions should be consistent with our mission.
I don't have personal experience with circular thresholding algorithms. But Lai2014's algorithm, which you mention, seems to be the most likely candidate to include. So from my side, feel welcome to go ahead and reach out to the author.
I just sent a request to the author.
Do we need some sort of license statement that we can actually use the algorithm in scikit?
@stefanv would be the expert on licensing issues. But I think ideal would be if the author commented here (or in public somewhere else) that he makes the code available under the BSD-3-Clause to scikit-image.
Yukun, the original author, is happy to make his code available. I asked him to post here.
As an author of the Efficient Circular Thresholding paper, I can confirm that we are happy for the algorithm to be made open source under the BSD-3 clause and included in scikit-image.
Thank you @pohlt for reaching out and thank you @yukun-lai for the kind licensing of your code!
I kind of feel bad for thinking about this only now but since the profile, @yukun-lai, is very new I don't know how to verify that you are the actual code author. :see_no_evil:
I'm currently trying to figure out if that's actually necessary and / or how we could do that so the effort on your side is minimal.
Dear Prof. Yu-Kun Lai, thank you for agreeing to publish your code under the Modified BSD license. Would you be able to change the copy on your website to have the license included? We can provide the license text, if that would be helpful. This is the easiest way for us to verify the licensing intent.
Alternatively, an email to stefanv at berkeley.edu with the statement you made above, sent from your university email address, will suffice.
Dear Prof. Yu-Kun Lai, thank you for agreeing to publish your code under the Modified BSD license. Would you be able to change the copy on your website to have the license included? We can provide the license text, if that would be helpful. This is the easiest way for us to verify the licensing intent.
Alternatively, an email to stefanv at berkeley.edu with the statement you made above, sent from your university email address, will suffice.
I can see your concern. I used to have an old account but can't seem to access it. Just sent the email using my University email address.
Hi @stefanv, just let me know when you got the mail.
I'm currently busy, so there's no rush with the license topic.
From: Yukun Lai <LaiY4@cardiff...> To: "stefanv@berkeley" <stefanv@berkeley...> Subject: circular thresholding code Date: Wed, 18 Oct 2023 09:04:11 +0000 Message-ID: LO0P265MB6053BA9C84B2E292204B9479B2D5A@LO0P265MB6053.GBRP265.PROD.OUTLOOK.COM
Dear Stefan,
As an author of the Efficient Circular Thresholding paper, I can confirm that we are happy for the algorithm to be made open source under the BSD-3 clause and included in scikit-image.
Best regards, Yukun
Thank you, @yukun-lai 🙏
A first implementation is ready: https://github.com/pohlt/scikit-image/blob/co/__co/co.ipynb
There is a naive implementation doing a lot of redundant calculations. Another version ("less") avoids some calculations of sigma. Yet another version ("updater") does something like a rolling update of the values similar to Yukun's code. The performance on my laptop is about 1-3 ms for all three versions. The updater version is typically the fastest. If you run the notebook locally, you can even play around with a few sliders (based on ipywidgets).
Any feedback is highly appreciated!
Could you please give me some guidance concerning the interface?
Similar to the signature of the linear otsu call I would like to preserve the choice to either provide an image or an already calculated histogram. Both options have their challenges:
- image: We either need to know the value range (minimum and maximum) to correctly calculate the histogram (e.g. [0, 2*pi) ) or we assume that the input is already normalized such that the range is [0, 1) .
- histogram: Linear otsu accepts histograms with bins of varying size which makes it difficult to correctly "wrap" the histogram. How should we handle this in the context of circular thresholding?
A first implementation is ready: https://github.com/pohlt/scikit-image/blob/co/__co/co.ipynb
Thanks so much for working on this. I have a few questions:
- These functions currently take in a histogram
(x, h)correct? - I am noticing that the plots only display "naive circular otsu" in their legend? From your code it seems like you want to display the "less" and "updater" version also.
Similar to the signature of the linear otsu call I would like to preserve the choice to either provide an image or an already calculated histogram. Both options have their challenges: [...]
I am not so sure if this is good API design. But to answer your question, it seems that for the linear case that problem is currently solved by https://github.com/scikit-image/scikit-image/blob/8f864325214ac7a9876b07d80e89815a7ab4d179/skimage/filters/thresholding.py#L387
would that work for your case as well? For the image case, I'd just document how the minimum and maximum are determined. And if the user want's something different, he's supposed to pass in the histogram themselves. Concerning the "bins of varying size", for now I'd put in a check that raises an error if bins are not of the same size. Once we get to reviewing (and in my case understanding the actual algorithm) we may see an opportunity to address this.
If you feel ready I'd encourage you to make a draft PR. :D
The notebook was just a proof of concept, not the actual API I have in mind. Sorry for the confusion.
- histogram: I like the check for constant bin sizes, so let's do it like this.
- image: If we just take the minimum and maximum value found in the image for the total range, the results will be broken most of the time. Forcing the user to normalize the range to [0, 1) increases the required computations. I'd propose to have additional parameter(s) (min_val/max_val or range with a tuple) which the user has to provide if she wants the histogram to be calculated.
What do you think?
Adding additional parameters if the histogram is to be calculated is something that we could totally do. Could you explain why the algorithm needs to know these min and max values to be correct in most cases? Is that to correctly judge the distance between the two bins that wrap around from the end to the start of the histogram / value range?
Sure, let's assume you calculated the hue values of an image which are often in the range [0, 2pi) and you want to use those hue values to do circular thresholding. The image most likely does not contain exactly 0 or 2pi, so taking the min/max values of the image to calculate the histogram would be (slightly) off.
Therefore, to calculate a correct circular thresholding value you have to specify the exact range.
Does that make sense?
On 28 November 2023 20:06:03 CET, "Lars Grüter" @.***> wrote:
Adding additional parameters if the histogram is to be calculated is something that we could totally do. Could you explain why the algorithm needs to know these min and max values? Is that to correctly judge the distance between the two bins that wrap around from the end to the start of the histogram / value range?
-- Reply to this email directly or view it on GitHub: https://github.com/scikit-image/scikit-image/issues/7205#issuecomment-1830503883 You are receiving this because you were mentioned.
Message ID: @.***>
Dear all,
Sorry for my late reply.
For the histogram input, I think it makes sense to only handle cases with uniform bins? For the image input, it is possible to take as input the min/max values (which are needed, unless we assume the range of values is normalised). For image input, I suppose we also need to know the bin size or the number of bins, especially if the input image has pixel values as real numbers?
In terms of different implementations, they should give identical results. The updater version is significantly faster when there are more bins (16-bit case rather than 256 bins).
Are we also considering the case with multiple thresholds (more than 2)?
Best regards, Yukun
On Tue, Nov 28, 2023 at 8:25 PM Tom Pohl @.***> wrote:
Sure, let's assume you calculated the hue values of an image which are often in the range [0, 2pi) and you want to use those hue values to do circular thresholding. The image most likely does not contain exactly 0 or 2pi, so taking the min/max values of the image to calculate the histogram would be (slightly) off.
Therefore, to calculate a correct circular thresholding value you have to specify the exact range.
Does that make sense?
On 28 November 2023 20:06:03 CET, "Lars Grüter" @.***> wrote:
Adding additional parameters if the histogram is to be calculated is something that we could totally do. Could you explain why the algorithm needs to know these min and max values? Is that to correctly judge the distance between the two bins that wrap around from the end to the start of the histogram / value range?
-- Reply to this email directly or view it on GitHub:
https://github.com/scikit-image/scikit-image/issues/7205#issuecomment-1830503883
You are receiving this because you were mentioned.
Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/scikit-image/scikit-image/issues/7205#issuecomment-1830668132, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDI4FRTL4RHBZ7JULZS5V2DYGZCEHAVCNFSM6AAAAAA566RXBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZQGY3DQMJTGI . You are receiving this because you were mentioned.Message ID: @.***>
- Histogram input: I'm fine with limiting the API to uniform bins
- Image input: As described above, we need explicit min/max values. Number of bins is certainly desirable in any case, but we could prescribe a default value of 256.
- Let's get started with a single threshold. A multi-threshold use case would be a different call where we have the same input as in the single-threshold case, but additionally the number of thresholds, so no need to consider for the simple single-threshold case.
Sorry for the slow progress, but other projects leave not much wiggle room for fun stuff like this one here.
Sorry, guys. Busy times, but I still plan to get this done...
Hi,
I think the code is already working correctly. The updater version is fastest (especially when the number of bins is large, e.g. 16-bit images). For two-class cases, we only need min/max values and the number of bins (default can be 0/255, with 256 bins).
Best regards, Yukun
On Wed, May 22, 2024 at 8:58 AM Tom Pohl @.***> wrote:
Sorry, guys. Busy times, but I still plan to get this done...
— Reply to this email directly, view it on GitHub https://github.com/scikit-image/scikit-image/issues/7205#issuecomment-2124120025, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDI4FRWT3MK2INRT22DWX4LZDRF33AVCNFSM6AAAAAA566RXBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRUGEZDAMBSGU . You are receiving this because you were mentioned.Message ID: @.***>