tobac
tobac copied to clipboard
Merge Split algorithm added
The addition in this pull request is a post processing algorithm for the tobac tracking. This file contains the merge_split function to address merging/splitting cells in the tobac track, as well as a quick compression function, and a function to standardize the track dataset in order to have all tracking, feature, and segmentation mask datasets in one xarray dataset. An example of these functions is given below:
d = merge_split(Track) ds = standardize_track_dataset(Track, refl_mask, data['ProjectionCoordinateSystem']) both_ds = xr.merge([ds, d],compat ='override') both_ds = compress_all(both_ds) both_ds.to_netcdf(os.path.join(savedir,'Track_features_merges.nc'))
@freemansw1 Nothing was going right on my end so I just started fresh. Let me know if there's anything wonky.
@kelcyno Great thank you, I will also start to have a look your PR now! Is it maybe possible to add tests for your additions (or at least one for the merge_split
function)?
Sorry for my delay on this, I should be able to review next week after my defense. @kelcyno do you want me to review ASAP or are there changes to be made from @JuliaKukulies's review?
Sean, I still have some edits from Julia to work on. It can wait until after your defense.
OK, I'll wait to review this until you have made the changes @kelcyno . When you do, we can re-request reviews from both @JuliaKukulies and myself.
One note, with only having glanced at the PR, if we need to add new requirements for this, we should either make them optional (add a check to see if the package can be imported when trying to use this function; if not throw an error and quit) or add them to the various requirements files.
@freemansw1 @JuliaKukulies I updated the merge_split.py to comply with documentation and Julia's previous comments. Looking forward to more input/comments.
Codecov Report
Base: 36.22% // Head: 38.97% // Increases project coverage by +2.74%
:tada:
Coverage data is based on head (
800fdfe
) compared to base (720ef18
). Patch coverage: 85.60% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## RC_v1.4.0 #136 +/- ##
=============================================
+ Coverage 36.22% 38.97% +2.74%
=============================================
Files 10 11 +1
Lines 2120 2245 +125
=============================================
+ Hits 768 875 +107
- Misses 1352 1370 +18
Flag | Coverage Δ | |
---|---|---|
unittests | 38.97% <85.60%> (+2.74%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
tobac/utils.py | 48.28% <13.33%> (-2.12%) |
:arrow_down: |
tobac/merge_split.py | 95.41% <95.41%> (ø) |
|
tobac/__init__.py | 92.30% <100.00%> (+0.30%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Thanks @kelcyno , I'll take a look at this today or tomorrow. Just so you know- the reason your PR is failing checks is that you need to run the black
formatter on your new file.
One question before I begin review- do you have documentation handy for this?
Just noting that I know you are still waiting on a review from me. It's been very busy and I haven't quite had the time to give reviewing this the attention it deserves. I should have time early next week, though. I do have some ideas both on the documentation front and the testing front though, and I'll post those when I post my review.
I think given the number of things left to resolve, I would be inclined to push this to v1.5? That would give us documentation and spectral filtering in v1.4. What do you think @JuliaKukulies @w-k-jones @fsenf
That sounds like a good idea if @kelcyno is fine with that as well. Thanks for the additional comments @freemansw1!
@kelcyno Are you ready for a re-review on this, or are you still iterating? Sorry, I've been busy and lost track of this.
Just to note where we are, I think @JuliaKukulies and I are waiting for a few last changes and the documentation file before we re-review, right @kelcyno ?
@freemansw1 @JuliaKukulies I've added documentation and recent changes we discussed in the last tobac meeting. Ready for another round of reviews!
@kelcyno can you do me a favor and merge the latest RC_v1.4.0
changes into this PR? I can also do it from my side, but it's probably easier for you to do it from yours.
Never mind, I figured it out and it went through really quickly.
Great contribution @kelcyno ! I think this is a really interesting functionality and that's why I wanted to try it. This is the code to create the small dataset with a merge, which might be a good fit for a test:
import pandas as pd
import tobac
import tobac.testing
cell_1 = tobac.testing.generate_single_feature(
1,
1,
min_h1=0,
max_h1=100,
min_h2=0,
max_h2=100,
frame_start=0,
num_frames=2,
spd_h1=20,
spd_h2=20,
)
cell_1['feature'] = [1, 3]
cell_2 = tobac.testing.generate_single_feature(
1,
100,
min_h1=0,
max_h1=100,
min_h2=0,
max_h2=100,
frame_start=0,
num_frames=2,
spd_h1=20,
spd_h2=-20,
)
cell_2['feature'] = [2, 4]
cell_3 = tobac.testing.generate_single_feature(
30,
50,
min_h1=0,
max_h1=100,
min_h2=0,
max_h2=100,
frame_start=2,
num_frames=2,
spd_h1=20,
spd_h2=0,
)
cell_3['feature'] = [5, 6]
features = pd.concat([cell_1, cell_2, cell_3])
output = tobac.linking_trackpy(
features, None, 1, 1, v_max=30
)
I tried to keep the feature ids matching the frames, but I don't know if that's important. My idea would now be to infer the output that merge_split_cells()
should return from the dataset and the picture,
put that expected output into the format
expected_d = xr.Dataset(
{
"track": (track_dim, track_ids),
"cell": (cell_dim, cell_id),
"cell_parent_track_id": (cell_dim, cell_parent_track_id),
"feature": (feature_dim, feature_id),
"feature_parent_cell_id": (feature_dim, feature_parent_cell_id),
"feature_parent_track_id": (feature_dim, feature_parent_track_id),
"track_child_cell_count": (track_dim, track_child_cell_count),
"cell_child_feature_count": (cell_dim, cell_child_feature_count),
}
)
and assert the real output in a test function.
Unfortunately, I can't get past the assert statements at the end of merge_split_cells()
, which could mean that something is not working properly or that I have gotten something wrong and the dataset needs to be be altered. In either case, I would be happy to contribute further.
@snilsn I did not update the assertion at the end when I updated how I was numbering tracks. I've tried it with your cells and your test dataset made it pass all assert statements. Do you mind continuing to work on the testing aspect? Should we move the assert statements from the end of merge_split_cells? I believe @freemansw1 mentioned at some point that this might be the correct way to handle testing.
I think the approach that @snilsn has for testing is probably the way to go, and is an excellent contribution to this. I believe using his tests, you can remove the assert
statements out of the main function and instead do all testing in the testing code.
@kelcyno You are right, it works, no idea what happened a week ago.
Now that I've seen the output, I'm not sure if I could tell if this is the right result (e.g. the track id is -1, not sure what that means). I think it would be easier for you to finish the test, since you know what the result should be. It would take a while for me to get used to this parent/child notation and I have almost zero experience with xarray.
Hi @kelcyno! Thanks a lot for updating the documentation page. I committed some minor changes to your merge_split
branch:
- I removed the text you had accidently copied in twice in
merge_split.rst
, fixed some formatting issues in the .rst file and removed the redundantmerge_split_output_vars.rst
- I also solved the merge conflict (caused by the utils module) by pulling in the latest changes from
RC_v1.4.0
I hope it was OK that I directly committed the changes, please let me know if you are not happy with them.
Regarding the test: I think we can work on this collaboratively given that you were not sure about how to add the unit tests etc. I am also happy to help with this and I think @snilsn already provided the main code needed. But as @snilsn says, maybe you can help with describing what we would expect. @snilsn I also tested your example and I since a track ID of -1
simply means that no cell has been merged (which is clearly not the desired output), I am wondering if this has something to do with the test data cells. If I use your test data, tobac.linking_trackpy()
gives me three individual cell IDs with no feature sharing the same position. This is different from what your figure suggest, because you indicate only two cells (one orange and one blue), while I am getting this:
@JuliaKukulies Yes, you are right, I did multiple versions of this in a notebook and have probably used the wrong picture.
I increased v_max
to get only two cells, but that doesn't change the track ID. My next guess would be to increase the distance
parameter of merge_split_cells()
. Maybe the test data has an awkward spacing?
Good point @snilsn and thanks for testing this one more time. I believe that you can be right about the distance
parameter which is set to 25000 m by default and is compared to the actual distance between your features that depend on what you had chosen as dxy
(maybe we should change this ? - see my comment below)
OK, I've updated the PR with the latest changes from RC_v1.4.0
and re-run black for the utils.py
file.
On the testing front, I'll take a stab at adding a test (tests?) based on what @snilsn has done above and will PR to @kelcyno's branch. That seems like the best path forward to me? Sorry for pushing on this; I'm just very eager to get 1.4.0 (and 1.5.0) out.
That sounds good to me @freemansw1. Thank you for taking initiative on the test!
On the testing front, I'll take a stab at adding a test (tests?) based on what @snilsn has done above and will PR to @kelcyno's branch. That seems like the best path forward to me?
If @kelcyno does not have time to review and is OK with this, you could also directly commit your test to this PR like you have done @freemansw1 ( https://tighten.com/blog/adding-commits-to-a-pull-request/ )
Note: the merge_split.py
file isn't appearing in the documentation; I'm not sure why this is, but I need to fix it before merging.
@freemansw1 I'm not a huge fan of only making the distance threshold mandatory. I think it's important to have a default option, as the other functions all have default values if no keyword is given. We can make it clear in the documentation that inputting the users own distance is recommended, but having a value ~ 10x the dxy term as default is what I prefer. You're within your purview to override me here, but I think we can do a lot with documentation here to help the user.
@JuliaKukulies I can review the PR. Real quick, to make sure our labeling is consistent - the Feature numbering begins at 1, the cell numbering begins at 0 or 1? I've been initiating the Track numbering at 0 but I'm wondering if I need to make it start at 1 to be consistent. Thoughts?
@kelcyno That's a reasonable perspective. I will note that we probably should have fewer default values, but in general, having a default value of None
(that we set to 10x dxy
in the code) would be a great compromise. If we decide not to have a default in the future, that's a lot easier to undo than requiring the parameter (see our #125 headache).
@kelcyno I'm about to open a PR onto your branch with my test (and a few other changes, too).