tobac icon indicating copy to clipboard operation
tobac copied to clipboard

Merge Split algorithm added

Open kelcyno opened this issue 2 years ago • 23 comments

The addition in this pull request is a post processing algorithm for the tobac tracking. This file contains the merge_split function to address merging/splitting cells in the tobac track, as well as a quick compression function, and a function to standardize the track dataset in order to have all tracking, feature, and segmentation mask datasets in one xarray dataset. An example of these functions is given below:

d = merge_split(Track) ds = standardize_track_dataset(Track, refl_mask, data['ProjectionCoordinateSystem']) both_ds = xr.merge([ds, d],compat ='override') both_ds = compress_all(both_ds) both_ds.to_netcdf(os.path.join(savedir,'Track_features_merges.nc'))

kelcyno avatar Jun 06 '22 21:06 kelcyno

@freemansw1 Nothing was going right on my end so I just started fresh. Let me know if there's anything wonky.

kelcyno avatar Jun 06 '22 21:06 kelcyno

@kelcyno Great thank you, I will also start to have a look your PR now! Is it maybe possible to add tests for your additions (or at least one for the merge_split function)?

JuliaKukulies avatar Jun 08 '22 09:06 JuliaKukulies

Sorry for my delay on this, I should be able to review next week after my defense. @kelcyno do you want me to review ASAP or are there changes to be made from @JuliaKukulies's review?

freemansw1 avatar Jun 22 '22 03:06 freemansw1

Sean, I still have some edits from Julia to work on. It can wait until after your defense.

kelcyno avatar Jun 22 '22 05:06 kelcyno

OK, I'll wait to review this until you have made the changes @kelcyno . When you do, we can re-request reviews from both @JuliaKukulies and myself.

One note, with only having glanced at the PR, if we need to add new requirements for this, we should either make them optional (add a check to see if the package can be imported when trying to use this function; if not throw an error and quit) or add them to the various requirements files.

freemansw1 avatar Jun 27 '22 18:06 freemansw1

@freemansw1 @JuliaKukulies I updated the merge_split.py to comply with documentation and Julia's previous comments. Looking forward to more input/comments.

kelcyno avatar Jul 03 '22 03:07 kelcyno

Codecov Report

Base: 36.22% // Head: 38.97% // Increases project coverage by +2.74% :tada:

Coverage data is based on head (800fdfe) compared to base (720ef18). Patch coverage: 85.60% of modified lines in pull request are covered.

Additional details and impacted files
@@              Coverage Diff              @@
##           RC_v1.4.0     #136      +/-   ##
=============================================
+ Coverage      36.22%   38.97%   +2.74%     
=============================================
  Files             10       11       +1     
  Lines           2120     2245     +125     
=============================================
+ Hits             768      875     +107     
- Misses          1352     1370      +18     
Flag Coverage Δ
unittests 38.97% <85.60%> (+2.74%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tobac/utils.py 48.28% <13.33%> (-2.12%) :arrow_down:
tobac/merge_split.py 95.41% <95.41%> (ø)
tobac/__init__.py 92.30% <100.00%> (+0.30%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Jul 03 '22 03:07 codecov[bot]

Thanks @kelcyno , I'll take a look at this today or tomorrow. Just so you know- the reason your PR is failing checks is that you need to run the black formatter on your new file.

One question before I begin review- do you have documentation handy for this?

freemansw1 avatar Jul 06 '22 22:07 freemansw1

Just noting that I know you are still waiting on a review from me. It's been very busy and I haven't quite had the time to give reviewing this the attention it deserves. I should have time early next week, though. I do have some ideas both on the documentation front and the testing front though, and I'll post those when I post my review.

freemansw1 avatar Jul 26 '22 01:07 freemansw1

I think given the number of things left to resolve, I would be inclined to push this to v1.5? That would give us documentation and spectral filtering in v1.4. What do you think @JuliaKukulies @w-k-jones @fsenf

freemansw1 avatar Aug 04 '22 19:08 freemansw1

That sounds like a good idea if @kelcyno is fine with that as well. Thanks for the additional comments @freemansw1!

JuliaKukulies avatar Aug 05 '22 14:08 JuliaKukulies

@kelcyno Are you ready for a re-review on this, or are you still iterating? Sorry, I've been busy and lost track of this.

freemansw1 avatar Sep 07 '22 23:09 freemansw1

Just to note where we are, I think @JuliaKukulies and I are waiting for a few last changes and the documentation file before we re-review, right @kelcyno ?

freemansw1 avatar Sep 13 '22 22:09 freemansw1

@freemansw1 @JuliaKukulies I've added documentation and recent changes we discussed in the last tobac meeting. Ready for another round of reviews!

kelcyno avatar Sep 18 '22 16:09 kelcyno

@kelcyno can you do me a favor and merge the latest RC_v1.4.0 changes into this PR? I can also do it from my side, but it's probably easier for you to do it from yours.

freemansw1 avatar Sep 26 '22 12:09 freemansw1

Never mind, I figured it out and it went through really quickly.

freemansw1 avatar Sep 27 '22 14:09 freemansw1

Great contribution @kelcyno ! I think this is a really interesting functionality and that's why I wanted to try it. This is the code to create the small dataset with a merge, which might be a good fit for a test:

import pandas as pd
import tobac
import tobac.testing

cell_1 = tobac.testing.generate_single_feature(
        1,
        1,
        min_h1=0,
        max_h1=100,
        min_h2=0,
        max_h2=100,
        frame_start=0,
        num_frames=2,
        spd_h1=20,
        spd_h2=20,
    )
cell_1['feature'] = [1, 3]

cell_2 = tobac.testing.generate_single_feature(
        1,
        100,
        min_h1=0,
        max_h1=100,
        min_h2=0,
        max_h2=100,
        frame_start=0,
        num_frames=2,
        spd_h1=20,
        spd_h2=-20,
    )
cell_2['feature'] = [2, 4]
cell_3 = tobac.testing.generate_single_feature(
        30,
        50,
        min_h1=0,
        max_h1=100,
        min_h2=0,
        max_h2=100,
        frame_start=2,
        num_frames=2,
        spd_h1=20,
        spd_h2=0,
    )
cell_3['feature'] = [5, 6]
features = pd.concat([cell_1, cell_2, cell_3])

output = tobac.linking_trackpy(
        features, None, 1, 1, v_max=30
    )

merge

I tried to keep the feature ids matching the frames, but I don't know if that's important. My idea would now be to infer the output that merge_split_cells() should return from the dataset and the picture, put that expected output into the format

    expected_d = xr.Dataset(
        {
            "track": (track_dim, track_ids),
            "cell": (cell_dim, cell_id),
            "cell_parent_track_id": (cell_dim, cell_parent_track_id),
            "feature": (feature_dim, feature_id),
            "feature_parent_cell_id": (feature_dim, feature_parent_cell_id),
            "feature_parent_track_id": (feature_dim, feature_parent_track_id),
            "track_child_cell_count": (track_dim, track_child_cell_count),
            "cell_child_feature_count": (cell_dim, cell_child_feature_count),
        }
    )

and assert the real output in a test function.

Unfortunately, I can't get past the assert statements at the end of merge_split_cells(), which could mean that something is not working properly or that I have gotten something wrong and the dataset needs to be be altered. In either case, I would be happy to contribute further.

snilsn avatar Oct 07 '22 18:10 snilsn

@snilsn I did not update the assertion at the end when I updated how I was numbering tracks. I've tried it with your cells and your test dataset made it pass all assert statements. Do you mind continuing to work on the testing aspect? Should we move the assert statements from the end of merge_split_cells? I believe @freemansw1 mentioned at some point that this might be the correct way to handle testing.

kelcyno avatar Oct 11 '22 16:10 kelcyno

I think the approach that @snilsn has for testing is probably the way to go, and is an excellent contribution to this. I believe using his tests, you can remove the assert statements out of the main function and instead do all testing in the testing code.

freemansw1 avatar Oct 12 '22 16:10 freemansw1

@kelcyno You are right, it works, no idea what happened a week ago.

Now that I've seen the output, I'm not sure if I could tell if this is the right result (e.g. the track id is -1, not sure what that means). I think it would be easier for you to finish the test, since you know what the result should be. It would take a while for me to get used to this parent/child notation and I have almost zero experience with xarray.

snilsn avatar Oct 14 '22 14:10 snilsn

Hi @kelcyno! Thanks a lot for updating the documentation page. I committed some minor changes to your merge_split branch:

  1. I removed the text you had accidently copied in twice in merge_split.rst, fixed some formatting issues in the .rst file and removed the redundant merge_split_output_vars.rst
  2. I also solved the merge conflict (caused by the utils module) by pulling in the latest changes from RC_v1.4.0

I hope it was OK that I directly committed the changes, please let me know if you are not happy with them.

Regarding the test: I think we can work on this collaboratively given that you were not sure about how to add the unit tests etc. I am also happy to help with this and I think @snilsn already provided the main code needed. But as @snilsn says, maybe you can help with describing what we would expect. @snilsn I also tested your example and I since a track ID of -1 simply means that no cell has been merged (which is clearly not the desired output), I am wondering if this has something to do with the test data cells. If I use your test data, tobac.linking_trackpy() gives me three individual cell IDs with no feature sharing the same position. This is different from what your figure suggest, because you indicate only two cells (one orange and one blue), while I am getting this:

merge_split_test_data

JuliaKukulies avatar Oct 25 '22 08:10 JuliaKukulies

@JuliaKukulies Yes, you are right, I did multiple versions of this in a notebook and have probably used the wrong picture. I increased v_max to get only two cells, but that doesn't change the track ID. My next guess would be to increase the distance parameter of merge_split_cells(). Maybe the test data has an awkward spacing?

snilsn avatar Oct 25 '22 09:10 snilsn

Good point @snilsn and thanks for testing this one more time. I believe that you can be right about the distance parameter which is set to 25000 m by default and is compared to the actual distance between your features that depend on what you had chosen as dxy (maybe we should change this ? - see my comment below)

JuliaKukulies avatar Oct 25 '22 10:10 JuliaKukulies

OK, I've updated the PR with the latest changes from RC_v1.4.0 and re-run black for the utils.py file.

On the testing front, I'll take a stab at adding a test (tests?) based on what @snilsn has done above and will PR to @kelcyno's branch. That seems like the best path forward to me? Sorry for pushing on this; I'm just very eager to get 1.4.0 (and 1.5.0) out.

freemansw1 avatar Oct 31 '22 15:10 freemansw1

That sounds good to me @freemansw1. Thank you for taking initiative on the test!

JuliaKukulies avatar Oct 31 '22 15:10 JuliaKukulies

On the testing front, I'll take a stab at adding a test (tests?) based on what @snilsn has done above and will PR to @kelcyno's branch. That seems like the best path forward to me?

If @kelcyno does not have time to review and is OK with this, you could also directly commit your test to this PR like you have done @freemansw1 ( https://tighten.com/blog/adding-commits-to-a-pull-request/ )

JuliaKukulies avatar Oct 31 '22 15:10 JuliaKukulies

Note: the merge_split.py file isn't appearing in the documentation; I'm not sure why this is, but I need to fix it before merging.

freemansw1 avatar Oct 31 '22 15:10 freemansw1

@freemansw1 I'm not a huge fan of only making the distance threshold mandatory. I think it's important to have a default option, as the other functions all have default values if no keyword is given. We can make it clear in the documentation that inputting the users own distance is recommended, but having a value ~ 10x the dxy term as default is what I prefer. You're within your purview to override me here, but I think we can do a lot with documentation here to help the user.

kelcyno avatar Oct 31 '22 15:10 kelcyno

@JuliaKukulies I can review the PR. Real quick, to make sure our labeling is consistent - the Feature numbering begins at 1, the cell numbering begins at 0 or 1? I've been initiating the Track numbering at 0 but I'm wondering if I need to make it start at 1 to be consistent. Thoughts?

kelcyno avatar Oct 31 '22 15:10 kelcyno

@kelcyno That's a reasonable perspective. I will note that we probably should have fewer default values, but in general, having a default value of None (that we set to 10x dxy in the code) would be a great compromise. If we decide not to have a default in the future, that's a lot easier to undo than requiring the parameter (see our #125 headache).

@kelcyno I'm about to open a PR onto your branch with my test (and a few other changes, too).

freemansw1 avatar Oct 31 '22 16:10 freemansw1