tobac
tobac copied to clipboard
`feature` variable in *track* not all existed in *mask*
As linking_trackpy
uses the mask and features (generated by segmentation
) to track data, the feature
in the track should also exist in the mask. However, it's not. I write a simple notebook to show this problem using tobac sample data.
I encounter the same issue when the threshold in parameters_segmentation
is set below or equal to the maximum threshold used in parameters_features
(for a feature detection with target='minimum'
). So I guess this issue is related to the watershedding method for segmentation or that features with identified at the largest threshold are simply not segmented, because they already contain all values up to this threshold?
Here is a notebook that compares segmentation masks with their corresponding features when the segmentation threshold is 1. set larger than the highest brightness temperature threshold in parameters_features
and 2. set equal to the highest brightness temperature threshold in parameters_features
. For the first method, almost all features are found in the segmentation mask, but still not 100 %. What could be the reason for this?
Do we think this is a bug, or is this just documentation issues? I'm clearly trying to clear some things out today :).
As mentioned by @JuliaKukulies, it's related to the threshold. But, we don't understand why not all features are included in the masks. If the method is realistic, then it should be the documentation issue, otherwise, it's a bug ...
Hmm good question @freemansw1! Are you familiar enough with the watershedding and how the mask is created from it to know if anything could lead to the fact that not all feature IDs are present in the mask files?
My first thought was that it could make sense that this occurs when the segmentation threshold is set equal to one of the thresholds set in the feature detection. In this case, features may not be segmented/augmented, because the original feature already contained the values to which the area should be extended (hence ncells = 0, no mask in the mask file as discussed in #25)? But now that I am thinking of it again it should at least result in the area of the feature, since the watershedding is done from the center point of the feature and not from its contours right?
On top that this issue also occurs when the segmentation threshold is significantly lower/higher than the minimum/maximum of the threshold for the feature detection. So any ideas? It seems to me like something that should not happen, because in theory every detected feature can also be segmented or not?
I see two potential things here, but maybe I'm misunderstanding the issue and we can iterate on examples more:
- If the feature detection puts the feature location outside of the threshold area, the segmentation won't find anything. For example, I've made a little doodle of this here:
. This can be helped with the introduction of the "box" seeding introduced in #127 , but that's currently designed for 3D features only. It wouldn't be that challenging to extend it to 2D cases, I don't think.
- If the segmentation threshold is lower (assuming
target=minimum
) than the highest feature detection threshold (e.g., segmentation threshold is 275 and the feature detection thresholds are[250, 275, 290]
), the features detected on the higher thresholds may not necessarily have a segmentation associated with them.
I do wonder more generally if we should allow for saving of the entire feature area on feature detection, but that's a discussion for another issue (and probably only once we've moved to xarray).
Thanks for sharing your thoughts @freemansw1! That makes definitely sense - I had not thought about these cases before and I could imagine that case 1 applies in my example notebook. That would mean that this is not really a bug, so a suggestion would be to verify your two suggested cases and then just add this as a note on the improved documentation page? Or do you think it is not worth mentioning anywhere?
And yes, I think it would be great to save the feature area in the feature detection dataframe (and some additional statistics on the data distribution within that area, i.e. min, max, mean values). This brings us back to our discussion in #18, and I am thinking that I will divide issue #18 into two issues:
- ability to choose different minimum areas (
n_min_threshold
) for different thresholds - save additional information such as feature area and statistics of values inside area to feature dataframe
Would you agree with that?
I think I had put them originally together, because I thought when you modify n_min_threshold
the way that it can be a list/dict, we could directly save information on the feature area, which we would probably make use of in the same part of the code. But these are definitely two separate features for tobac
to implement and I agree that the second one maybe can wait until we have moved everything to xarray, while the first is independent of that.
Just a small update here: I realized that we actually do save the feature area (nr. of grid cells within threshold) in num
or do I misunderstand this? --> see Feature detection output
Thanks for sharing your thoughts @freemansw1! That makes definitely sense - I had not thought about these cases before and I could imagine that case 1 applies in my example notebook. That would mean that this is not really a bug, so a suggestion would be to verify your two suggested cases and then just add this as a note on the improved documentation page? Or do you think it is not worth mentioning anywhere?
Yes, agreed on this. I'll think of the best way to do this and can put a page on this example in the PR. I think having a separate page just addressing this issue (not all features detected have segmented areas) is probably the way that I will go given the likelihood that this will happen to people (this is one of our oldest issues, after all!).
And yes, I think it would be great to save the feature area in the feature detection dataframe (and some additional statistics on the data distribution within that area, i.e. min, max, mean values). This brings us back to our discussion in #18, and I am thinking that I will divide issue #18 into two issues:
Yes, I think that splitting this issue is a good idea, and I should probably try to get back to that for v1.4.x or v1.5.x. I actually was talking about saving the location of all feature points, rather than bulk stats, but yes as we discussed in that issue, saving bulk stats is also something we should do. Thanks for raising that and thanks in advance for splitting it :)
Just a small update here: I realized that we actually do save the feature area (nr. of grid cells within threshold) in
num
or do I misunderstand this? --> see Feature detection output
Yes- I believe that this is true, but I haven't extensively validated that output, and I don't think we have tests to ensure that it is right.
Yes, agreed on this. I'll think of the best way to do this and can put a page on this example in the PR. I think having a separate page just addressing this issue (not all features detected have segmented areas) is probably the way that I will go given the likelihood that this will happen to people (this is one of our oldest issues, after all!).
OK, sounds great!
Yes, I think that splitting this issue is a good idea, and I should probably try to get back to that for v1.4.x or v1.5.x. I actually was talking about saving the location of all feature points, rather than bulk stats, but yes as we discussed in that issue, saving bulk stats is also something we should do. Thanks for raising that and thanks in advance for splitting it :)
I have modified #18 so that it only contains what the name of the issue suggests and opened #153 as a new issue.
This has been resolved with #195.