supervision
supervision copied to clipboard
Add new MaxInstanceLimiter class for limiting the number of instances while detection
Description
This PR adds a new utility class, MaxInstanceLimiter, which can be used to limit the number of instances while detecting objects in a video. The class includes the following features:
With this new class, it is now possible to limit the number of instances while detecting objects in a video and replace abnormal tracker IDs with the closest missing tracker IDs. The class is easy to use and can be integrated into any object detection pipeline.
related issue: #1072
Type of change
Please delete options that are not relevant.
- [x] New feature (non-breaking change which adds functionality)
- [x] This change requires a documentation update
How has this change been tested, please provide a testcase or example of how you tested the change?
I tested the MaxInstanceLimiter class with a sample video and a set of detections. I created an instance of the MaxInstanceLimiter class with a max_instance_count of 22 and a distance_threshold of 45. I then called the update_with_detections method with the set of detections for each frame of the video. Finally, I checked that the number of instances in the updated detections did not exceed the max_instance_count and that any abnormal tracker IDs were replaced with the closest missing tracker IDs.
Hi @westlinkin 👋
The code is well-structured and nicely documented, but I'm a bit unsure of its purpose. Granted, I didn't have much time to look deeper.
The way I imagine it, if you wanted to select N best detections, you'd filter by confidence. However, this code seems to pick another criterion for determining what to keep. Are you selecting by which object was tracked the longest? Are you filtering out outliers of some sort?
Can you go more in depth on what this approach is intended to address?
Hi @LinasKo
This can be used to limit the number of instances while detecting objects in a video. While I was tracking players on a football field, the tracker ids always got to be assigned a new one because of the collision of the players. So I try to limit the number of players to 22. If a new tracker id appears, then it will be re-assigned to a closest missing tracker id, so that the tracker ids limited to 22.
Here is the before and after videos. As you can see, it removes the detected sideline players and avoids re-assigning no.10 -> no.32 and no.14 -> no.26
Hi @westlinkin,
I've looked over the code and there's a few remarks. In short - assuming we fix the bugs, I see the upsides and how it addresses the problem you have, but I'm not convinced it generalizes to other problems well enough.
Let me illustrate:
This is a before & after image, at the 00:06 mark. The algorithm in the "after" image: + Correctly removes the non-playing players at the bottom. + Correctly hides the outlier (# 43) that arises when 2 players get tackled by a third (bottom left, on the ground). - It assigns the same label (# 2) to the bottom-right players, so there's a bug somewhere.
However, there's a broader issue, taking other domains into consideration - you may have 22 players on the field, but how can you know that the first 22 trackers will be the right ones? Maybe you start off on the side of the field and detect some outliers, maybe there's a coach who comes into view regularly.
I think there's value in finding ways to say "the trackers I see on frame X" are correct - focus on those from now on. But that would be a tracker consistency filter, and it should not care about the order of tracker_ids.
In terms of your problem, if it works - awesome. If there's a custom model you trained to detect the players, I'd look into splitting the classes into "player_on_field" and "player_outside_field".
For the labels, if numbers 1-22 are needed, a remapping dict should work - exactly like you did.
Lastly, seeing your video, it looks painful to debug. I'll see if I can find or code up a nicer annotator that'd show the overlaps a bit better.
@SkalskiP, what do you think?
I'm leaning towards not including this in the repo.
Overview:
- The goal is, knowing how many objects we'd have, to show tracks for those and not anything else. (Feel free to correct me, @westlinkin :smile:)
- Algo assumes tracker_ids 1-22 are valid (hence: first 1-22 detections considered good)
- Surplus IDs are hidden, unless something in 1-22 range is missing. In that case -> matches with closest missing tracker (distance to last-known position of missing + only if within distance threshold)
- There's a bug that displays same tracker repeatedly, probably when an old one is re-detected - let's assume that can be fixed.
- Some extra code review would be needed. (line_segment_distance, check if types valid in Py 3.8).
While I like seeing that it eliminated outliers (even one within the field of play), I don't think it's fair to give special treatment to IDs 1-22.
If we wanted to go this route, it'd be better to write a tracker-consistency filter / merger that checks for positions, missing targets nearby, keeps the X most consistently seen objects, reassigns IDs to newly spawned ones, but also handles the case where reassignment was incorrect. A bit complicated.
I think it should be models that are responsible for correct detections, with confidence-based filtering on top.
Hi @LinasKo
I know my case is probably not universal.
At first, I attempted to adjust the parameters of ByteTrack to improve the tracking results, but I kept encountering situations where the tracker ID > 22 appears (which would be understandable if it only occurred when players were tackled, like at the end of the video). However, what I frequently encountered was the following situation:
no.14 -> no.26 in less than 500ms.
Therefore, I chose an alternative approach to "fix" the tracker ID. I'm really not sure how to write a better tracker-consistency filter/merger so I took this path. 😂
I see! So it's in-field outliers that MaxInstanceLimiter eliminates...
I've just found something that might help with the bug I mentioned previously. Right now, after, e.g. track #2 disappears, it might pop back into existence, since a few frames are buffered. Normally, together with your remapper, you'd need an unmapper to handle these cases. It ByteTracker has a track_buffer variable, which tells how many frames it can buffer for. If you set it to 0 or 1 - perhaps it'd ditch the tracks right away, so you don't get any duplicate labels?
Otherwise - we're still very grateful for your contribution. The code is really well-structured and does touch on a problem many people will have - making their trackers consistent in a complicated environment with lots of occlusions. I just don't think tracker_ids are enough of an answer :smile:
Hi @LinasKo
Thank you for your reply. I'll definitely try adjust the track_buffer parameter! Thanks!
Hi @westlinkin 👋🏻 Thank you very much for your interest in Supervision as well as the time you spent preparing this PR. After careful analysis, I have decided to side with @LinasKo's comments and not merge this PR. I share a similar opinion - it seems to me that this tool is not universal enough. I know it works in the case you described, but nevertheless, I believe its place is not in Supervision. Thank you once again for the time invested.
@westlinkin, final point; might help.
TraceAnnotator has a color_lookup param. If you change to TRACK, it will give each different tracker a different color.
Good luck!
@LinasKo Thank you very much!