sleap Handle duplicate tracks when merging

Currently it looks like we're not appropriately handling duplicate tracks when merging.

From the GUI, this crops up when we merge labels.

From the API, the entrypoint is when you do sleap.load_file(..., match_to=base_labels).

This should be handled downstream here somewhere: https://github.com/talmolab/sleap/blob/ebd2e1ec8f062efaf2588884bb46e92a8030ddcd/sleap/io/format/labels_json.py#L401-L402

The only tricky part is that we may have tracks with the same name that should actually be different tracks.

For identity/apperance-based models, we use the track name to identify that it's the same animal, which is the use case described in #1080.

At a minimum, one fix would be to discard any new empty tracks after a merge operation.

Merging tracks with the same name might be tricky though, so we might want to think about edge cases:

Same track name, different videos, but not identity-based. A track named track_0 in one video and a different track named track_0 in a separate video probably shouldn't be merged
Same track name, different videos, identity-based. A track named male_adult in one video and a a different track named male_adult should probably be merged, though technically it won't matter downstream for ID models if they're different objects since we just use the name to match them
Same video, not identity-based. Two tracks with the same name in the same video should probably be merged, but the edge case is where tracking is run twice and then merged. In that case, track_0 might refer to a different animal in each run, so we might want to do instance-level matching to resolve the differences?
Same video, identity-based. Tracks with the same name should always be merged.

I think the bigger version of this fix would involve adding a new attribute to Tracks that specifies whether it's a "class" or unique track.

In the meantime, maybe we can add a flag to sleap.load_file (+ a Labels.merge_tracks(by_name: bool = True) instance method) and a GUI option that allows the user to specify whether to merge tracks by name. This could be done post-merge via a menu item in the Tracks menu, but we could also add a convenience checkbox in the merge resolution window.

For reference, I wrote a Colab that does the merging by track name (but also some other reindexing): https://colab.research.google.com/drive/13DAiPiLq4_8suZOlPoD67ReIJzjN8oCW?usp=sharing

The core logic for merging by name is something like:

base_labels = sleap.load_file("base_labels.slp")

# Make track name to Track object map.
reference_tracks = {track.name: track for track in base_labels}

# Load saved labels.
new_labels = sleap.load_file("new_labels.slp", match_to=base_labels)  # Example: predictions

# Update the track reference to use the reference tracks to prevent duplication.
for lf in new_labels:
        for instance in lf:
            if instance.track is not None:
                if instance.track.name in reference_tracks:
                    instance.track = reference_tracks[instance.track.name]
                else:
                    reference_tracks[instance.track.name] = instance.track

# Now do merge for all the other data structures

Discussed in https://github.com/talmolab/sleap/discussions/1080

^{Originally posted by lisadiez December 14, 2022} Hello,

We are trying to merge two slp projects together, but we're getting duplicate tracks:

We first tried File --> Merge into Project... directly from the GUI and no conflicts emerged, but we still got the duplicate tracks. We also tried with the following code (provided by Talmo a few months ago), but that didn't work either:

We're using v1.2.9.

Here's a folder with the videos and the two slp files: https://drive.google.com/drive/folders/1OPLs3iyld6Ib3iJYUKKxhj4a_h1mCgWM?usp=sharing

Do you have a workaround for this?

Thank you!! Lisa

Dec 15 '22 20:12 talmo

Hi @talmo , thanks for tracking this issue.

In our use case, we have a bit more complex problem as the 2 .slp files contain different types of data (Perhaps this is why sleap is no de-duplicating properly).

~~1. slp file 1 comes from the human annotation over 10 vids and contains 1000 human labeled frames out of 300,000. We'd like to ideally keep just the 1000 human labeled frames and make an "subsampled human .slp" file (to be merged below).~~ DONE (to post code)

~~2. slp file 2 comes from predictions on several videos. The colab you generated for us a few days ago does work by looping over these .slp files extracting only frames we want (from a previously computed list) and then making a "subsampled predicted .slp" file which contains only the predictions we want.~~ DONE (Talmo's code)

We would then like to merge the human labeled (subsampled) and the prediction labels (subsampled as well) .slp files. It's not clear your code which works for 2 above will do 3 (and also for 1 we might also need separate code). TO CONFIRM (whether Talmos' code works).

I hope this makes sense, let us know otherwise. Thanks so much catubc

Dec 21 '22 07:12 catubc

An unofficial gist to provide an intermediary solution.

Related discussions:

#1557
#1558

Oct 17 '23 20:10 roomrys