tobac icon indicating copy to clipboard operation
tobac copied to clipboard

`feature` variable in *track* not all existed in *mask*

Open zxdawn opened this issue 3 years ago • 9 comments

As linking_trackpy uses the mask and features (generated by segmentation) to track data, the feature in the track should also exist in the mask. However, it's not. I write a simple notebook to show this problem using tobac sample data.

zxdawn avatar Jun 07 '21 05:06 zxdawn

I encounter the same issue when the threshold in parameters_segmentation is set below or equal to the maximum threshold used in parameters_features (for a feature detection with target='minimum'). So I guess this issue is related to the watershedding method for segmentation or that features with identified at the largest threshold are simply not segmented, because they already contain all values up to this threshold?

Here is a notebook that compares segmentation masks with their corresponding features when the segmentation threshold is 1. set larger than the highest brightness temperature threshold in parameters_features and 2. set equal to the highest brightness temperature threshold in parameters_features. For the first method, almost all features are found in the segmentation mask, but still not 100 %. What could be the reason for this?

JuliaKukulies avatar Jun 07 '21 09:06 JuliaKukulies

Do we think this is a bug, or is this just documentation issues? I'm clearly trying to clear some things out today :).

freemansw1 avatar Jun 28 '22 22:06 freemansw1

As mentioned by @JuliaKukulies, it's related to the threshold. But, we don't understand why not all features are included in the masks. If the method is realistic, then it should be the documentation issue, otherwise, it's a bug ...

zxdawn avatar Jun 29 '22 09:06 zxdawn

Hmm good question @freemansw1! Are you familiar enough with the watershedding and how the mask is created from it to know if anything could lead to the fact that not all feature IDs are present in the mask files?

My first thought was that it could make sense that this occurs when the segmentation threshold is set equal to one of the thresholds set in the feature detection. In this case, features may not be segmented/augmented, because the original feature already contained the values to which the area should be extended (hence ncells = 0, no mask in the mask file as discussed in #25)? But now that I am thinking of it again it should at least result in the area of the feature, since the watershedding is done from the center point of the feature and not from its contours right?

On top that this issue also occurs when the segmentation threshold is significantly lower/higher than the minimum/maximum of the threshold for the feature detection. So any ideas? It seems to me like something that should not happen, because in theory every detected feature can also be segmented or not?

JuliaKukulies avatar Jun 29 '22 10:06 JuliaKukulies

I see two potential things here, but maybe I'm misunderstanding the issue and we can iterate on examples more:

  1. If the feature detection puts the feature location outside of the threshold area, the segmentation won't find anything. For example, I've made a little doodle of this here: image . This can be helped with the introduction of the "box" seeding introduced in #127 , but that's currently designed for 3D features only. It wouldn't be that challenging to extend it to 2D cases, I don't think.
  2. If the segmentation threshold is lower (assuming target=minimum) than the highest feature detection threshold (e.g., segmentation threshold is 275 and the feature detection thresholds are [250, 275, 290]), the features detected on the higher thresholds may not necessarily have a segmentation associated with them.

I do wonder more generally if we should allow for saving of the entire feature area on feature detection, but that's a discussion for another issue (and probably only once we've moved to xarray).

freemansw1 avatar Jun 29 '22 19:06 freemansw1

Thanks for sharing your thoughts @freemansw1! That makes definitely sense - I had not thought about these cases before and I could imagine that case 1 applies in my example notebook. That would mean that this is not really a bug, so a suggestion would be to verify your two suggested cases and then just add this as a note on the improved documentation page? Or do you think it is not worth mentioning anywhere?

And yes, I think it would be great to save the feature area in the feature detection dataframe (and some additional statistics on the data distribution within that area, i.e. min, max, mean values). This brings us back to our discussion in #18, and I am thinking that I will divide issue #18 into two issues:

  1. ability to choose different minimum areas ( n_min_threshold) for different thresholds
  2. save additional information such as feature area and statistics of values inside area to feature dataframe

Would you agree with that? I think I had put them originally together, because I thought when you modify n_min_threshold the way that it can be a list/dict, we could directly save information on the feature area, which we would probably make use of in the same part of the code. But these are definitely two separate features for tobac to implement and I agree that the second one maybe can wait until we have moved everything to xarray, while the first is independent of that.

JuliaKukulies avatar Jun 30 '22 08:06 JuliaKukulies

Just a small update here: I realized that we actually do save the feature area (nr. of grid cells within threshold) in num or do I misunderstand this? --> see Feature detection output

JuliaKukulies avatar Jul 01 '22 11:07 JuliaKukulies

Thanks for sharing your thoughts @freemansw1! That makes definitely sense - I had not thought about these cases before and I could imagine that case 1 applies in my example notebook. That would mean that this is not really a bug, so a suggestion would be to verify your two suggested cases and then just add this as a note on the improved documentation page? Or do you think it is not worth mentioning anywhere?

Yes, agreed on this. I'll think of the best way to do this and can put a page on this example in the PR. I think having a separate page just addressing this issue (not all features detected have segmented areas) is probably the way that I will go given the likelihood that this will happen to people (this is one of our oldest issues, after all!).

And yes, I think it would be great to save the feature area in the feature detection dataframe (and some additional statistics on the data distribution within that area, i.e. min, max, mean values). This brings us back to our discussion in #18, and I am thinking that I will divide issue #18 into two issues:

Yes, I think that splitting this issue is a good idea, and I should probably try to get back to that for v1.4.x or v1.5.x. I actually was talking about saving the location of all feature points, rather than bulk stats, but yes as we discussed in that issue, saving bulk stats is also something we should do. Thanks for raising that and thanks in advance for splitting it :)

Just a small update here: I realized that we actually do save the feature area (nr. of grid cells within threshold) in num or do I misunderstand this? --> see Feature detection output

Yes- I believe that this is true, but I haven't extensively validated that output, and I don't think we have tests to ensure that it is right.

freemansw1 avatar Jul 01 '22 15:07 freemansw1

Yes, agreed on this. I'll think of the best way to do this and can put a page on this example in the PR. I think having a separate page just addressing this issue (not all features detected have segmented areas) is probably the way that I will go given the likelihood that this will happen to people (this is one of our oldest issues, after all!).

OK, sounds great!

Yes, I think that splitting this issue is a good idea, and I should probably try to get back to that for v1.4.x or v1.5.x. I actually was talking about saving the location of all feature points, rather than bulk stats, but yes as we discussed in that issue, saving bulk stats is also something we should do. Thanks for raising that and thanks in advance for splitting it :)

I have modified #18 so that it only contains what the name of the issue suggests and opened #153 as a new issue.

JuliaKukulies avatar Jul 02 '22 16:07 JuliaKukulies

This has been resolved with #195.

freemansw1 avatar Oct 31 '22 15:10 freemansw1