muggled_sam icon indicating copy to clipboard operation
muggled_sam copied to clipboard

Detect new object and Keep tracking of old obejct

Open Greywan opened this issue 1 year ago • 3 comments

Hello, Thanks for this very wonderful and useful project. I was wondering if it's possible to keep detecting new objects (e.g. giving box) and tracking them, while keeping track of the old ones (even if they temporarily disappear due to occlusion). Thanks in advance for this!

Greywan avatar Sep 23 '24 08:09 Greywan

Thanks for checking out the repo!

Yes it's possible to track multiple objects. The video_segmentation example script has the code needed for handling a single object. The code that's needed to 'start' tracking an object is this part:

# Get initial detection/memory data for an object
init_mask, init_mem, init_ptr = sammodel.initialize_video_masking(
    init_encoded_img, boxes_tlbr_norm_list, fg_xy_norm_list, bg_xy_norm_list
)
prompt_mems = deque([init_mem])
prompt_ptrs = deque([init_ptr])
prev_mems = deque([], maxlen=6)
prev_ptrs = deque([], maxlen=15)

And then the tracking code is this part:

# Update tracking of a single object
obj_score, best_mask_idx, mask_preds, mem_enc, obj_ptr = sammodel.step_video_masking(
  encoded_imgs_list, prompt_mems, prompt_ptrs, prev_mems, prev_ptrs
)
prev_mems.appendleft(mem_enc)
prev_ptrs.appendleft(obj_ptr)

So each new object would need it's own copy of the prompt_mems, prompt_ptrs, prev_mems, prev_ptrs variables, and they would just need to be updated in a loop while processing frames. For occlusions, the SAMv2 model already handles it quite well, but you may want to stop recording the memory data (i.e. which is the .appendleft(...) parts above) whenever the obj_score is less than 0. This helps to avoid having bad data corrupt the memory when the object disappears.

Alternatively, if you just want the tracking and don't need the code, you can use the run_video script, which can keep track of multiple objects using the 'buffers' (you can add more buffers by calling the script with the -n flag). Here's an example on one of the videos from the MedSAM2 demo):

multiobj_tracking_example.webm

heyoeyo avatar Sep 23 '24 14:09 heyoeyo

I've just posted another example script for doing multi-object video segmentation, which might help if you're looking for a code-based starting point. It has hard-coded prompts which can be updated for your own video, but it's set up to work with a short video of horses available here: https://www.pexels.com/video/horses-running-on-grassland-4215784/

heyoeyo avatar Sep 23 '24 17:09 heyoeyo

Thank you for your reply!

Greywan avatar Sep 29 '24 06:09 Greywan