pysteps icon indicating copy to clipboard operation
pysteps copied to clipboard

Consider splits and merges in tdating

Open ritvje opened this issue 1 year ago • 3 comments

Hi! I've been working on including information on splits and merges of storm cells in tdating. This is a somewhat working version of that. Here currently,

  • a cell at time t is considered to split, if the advected version of the cell overlaps more than 10% with more than one cell at time t+1
  • a cell at time t is considered to be merged, if more than one advected cell (advected from t to t+1) overlaps more than 10% with it

The results are stored as additional columns in the cell dataframes:

  • splitted (bool): cell is considered split
  • split_IDs (list): list of IDs at next timestep that the cell split to
  • merged (bool): cell is considered a merge of multiple cells
  • merged_IDs (list): list of IDs from previous timestep that merged into this cell
  • results_from_split (bool): convenience attribute that is True if the cell is a result of split (i.e. the ID of the cell is present in the split_IDs of some cell at previous timestep)
  • will_merge (bool): convenience attribute that is True if the cell will merge at next timestep (i.e. the ID of the cell is present in the merge_IDs of some cell at next timestep). Will of course be empty, if the next timestep is not tracked

Additionally, this PR contains a bug fix in the pysteps.tracking.tdating.match function. As far as I can tell, in this bug the label 0 was included in possible match IDs, even though 0 denotes pixels that are not part of any cell. So there could be cases, where the advected cell would have e.g. 55% of 0 and 45% of some cell and it would not be matched to that cell. In my opinion, this bug needs to be fixed in the codebase even if the merge/split is not.

The code could probably be improved and e.g. it would benefit from some unit tests. But I'm creating this PR already for discussion especially for @pulkkins and @dnerini since this relates to my current work that you're involved in.

ritvje avatar Mar 07 '24 15:03 ritvje

Very nice contribution @ritvje ! I hope to be able to look into it soon, meanwhile let's ping in the original author of this routine

@feldmann-m in case you wants to leave a feedback on this PR

dnerini avatar Mar 07 '24 19:03 dnerini

Additionally, this PR contains a bug fix in the pysteps.tracking.tdating.match function. As far as I can tell, in this bug the label 0 was included in possible match IDs, even though 0 denotes pixels that are not part of any cell. So there could be cases, where the advected cell would have e.g. 55% of 0 and 45% of some cell and it would not be matched to that cell. In my opinion, this bug needs to be fixed in the codebase even if the merge/split is not.

@ritvje would it be possible to fix this bug in a separate PR?

dnerini avatar Mar 12 '24 15:03 dnerini

Sure, I made a new PR with that commit.

ritvje avatar Mar 14 '24 12:03 ritvje

dear @ritvje, next week @ladc and colleagues are organizing a pysteps hackathon, see https://github.com/orgs/pySTEPS/projects/2

I was thinking that this might be a good opportunity to advance on this PR? what do you think?

dnerini avatar Jul 12 '24 14:07 dnerini

Sure, it would be good to finally advance this! I should have time to work on this PR this week on Wednesday or Thursday (most likely evening-time).

Regarding the failing tests, I guess the main issue is the newly-added columns for the cell dataframe. Like @dnerini suggested, those (or the whole splits/merges analysis, since it's irrelevant if the information is not returned?) can be made optional.

ritvje avatar Jul 15 '24 11:07 ritvje

Codecov Report

Attention: Patch coverage is 97.19626% with 3 lines in your changes missing coverage. Please review.

Project coverage is 83.74%. Comparing base (953f799) to head (5c8fdce). Report is 3 commits behind head on master.

Files Patch % Lines
pysteps/tests/test_feature_tstorm.py 84.21% 3 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #349      +/-   ##
==========================================
+ Coverage   83.52%   83.74%   +0.22%     
==========================================
  Files         159      159              
  Lines       12575    12724     +149     
==========================================
+ Hits        10503    10656     +153     
+ Misses       2072     2068       -4     
Flag Coverage Δ
unit_tests 83.74% <97.19%> (+0.22%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jul 17 '24 12:07 codecov[bot]

I now made the split/merge analysis optional with the output_splits_merges argument in dating function. Also now the overlap fractions required for matching cells are given as arguments match_frac, split_frac and merge_frac. Before, match_frac was hard-coded to 0.4 (which is now the default value), and split_frac/merge_frac are added in this PR.

Regarding the tests, I added a new argument for the option to get the split/merge output and a some test cases to check that the output is correct. But there are no tests to check the actual functionality.

ritvje avatar Jul 17 '24 13:07 ritvje

Nice work! Looks good to me.

pulkkins avatar Jul 18 '24 14:07 pulkkins

@ritvje I included a short example and fixed some deprecation warnings. Let me know what do you think. I think this is now ready to be merged and released with the next pysteps version!

dnerini avatar Jul 20 '24 11:07 dnerini

@dnerini, thanks for adding the example! I also think it's ready to be merged.

ritvje avatar Jul 22 '24 05:07 ritvje