Consider splits and merges in tdating
Hi! I've been working on including information on splits and merges of storm cells in tdating. This is a somewhat working version of that. Here currently,
- a cell at time t is considered to split, if the advected version of the cell overlaps more than 10% with more than one cell at time t+1
- a cell at time t is considered to be merged, if more than one advected cell (advected from t to t+1) overlaps more than 10% with it
The results are stored as additional columns in the cell dataframes:
splitted(bool): cell is considered splitsplit_IDs(list): list of IDs at next timestep that the cell split tomerged(bool): cell is considered a merge of multiple cellsmerged_IDs(list): list of IDs from previous timestep that merged into this cellresults_from_split(bool): convenience attribute that is True if the cell is a result of split (i.e. the ID of the cell is present in thesplit_IDsof some cell at previous timestep)will_merge(bool): convenience attribute that is True if the cell will merge at next timestep (i.e. the ID of the cell is present in themerge_IDsof some cell at next timestep). Will of course be empty, if the next timestep is not tracked
Additionally, this PR contains a bug fix in the pysteps.tracking.tdating.match function. As far as I can tell, in this bug the label 0 was included in possible match IDs, even though 0 denotes pixels that are not part of any cell. So there could be cases, where the advected cell would have e.g. 55% of 0 and 45% of some cell and it would not be matched to that cell. In my opinion, this bug needs to be fixed in the codebase even if the merge/split is not.
The code could probably be improved and e.g. it would benefit from some unit tests. But I'm creating this PR already for discussion especially for @pulkkins and @dnerini since this relates to my current work that you're involved in.
Very nice contribution @ritvje ! I hope to be able to look into it soon, meanwhile let's ping in the original author of this routine
@feldmann-m in case you wants to leave a feedback on this PR
Additionally, this PR contains a bug fix in the pysteps.tracking.tdating.match function. As far as I can tell, in this bug the label 0 was included in possible match IDs, even though 0 denotes pixels that are not part of any cell. So there could be cases, where the advected cell would have e.g. 55% of 0 and 45% of some cell and it would not be matched to that cell. In my opinion, this bug needs to be fixed in the codebase even if the merge/split is not.
@ritvje would it be possible to fix this bug in a separate PR?
Sure, I made a new PR with that commit.
dear @ritvje, next week @ladc and colleagues are organizing a pysteps hackathon, see https://github.com/orgs/pySTEPS/projects/2
I was thinking that this might be a good opportunity to advance on this PR? what do you think?
Sure, it would be good to finally advance this! I should have time to work on this PR this week on Wednesday or Thursday (most likely evening-time).
Regarding the failing tests, I guess the main issue is the newly-added columns for the cell dataframe. Like @dnerini suggested, those (or the whole splits/merges analysis, since it's irrelevant if the information is not returned?) can be made optional.
Codecov Report
Attention: Patch coverage is 97.19626% with 3 lines in your changes missing coverage. Please review.
Project coverage is 83.74%. Comparing base (
953f799) to head (5c8fdce). Report is 3 commits behind head on master.
| Files | Patch % | Lines |
|---|---|---|
| pysteps/tests/test_feature_tstorm.py | 84.21% | 3 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #349 +/- ##
==========================================
+ Coverage 83.52% 83.74% +0.22%
==========================================
Files 159 159
Lines 12575 12724 +149
==========================================
+ Hits 10503 10656 +153
+ Misses 2072 2068 -4
| Flag | Coverage Δ | |
|---|---|---|
| unit_tests | 83.74% <97.19%> (+0.22%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I now made the split/merge analysis optional with the output_splits_merges argument in dating function. Also now the overlap fractions required for matching cells are given as arguments match_frac, split_frac and merge_frac. Before, match_frac was hard-coded to 0.4 (which is now the default value), and split_frac/merge_frac are added in this PR.
Regarding the tests, I added a new argument for the option to get the split/merge output and a some test cases to check that the output is correct. But there are no tests to check the actual functionality.
Nice work! Looks good to me.
@ritvje I included a short example and fixed some deprecation warnings. Let me know what do you think. I think this is now ready to be merged and released with the next pysteps version!
@dnerini, thanks for adding the example! I also think it's ready to be merged.