dora icon indicating copy to clipboard operation
dora copied to clipboard

Add support for dora-samurai for visual tracking of masks using Samurai!

Open haixuanTao opened this issue 9 months ago • 13 comments

The Segment Anything Model 2 (SAM 2) ( dora-sam2) has demonstrated strong performance in object segmentation tasks but faces challenges in visual object tracking, particularly when managing crowded scenes with fast-moving or self-occluding objects.

dora could greatly benefit from using dora-samurai instead of dora sam2 in order to track masked object in motion.

Reference: https://yangchris11.github.io/samurai/ Github: https://github.com/yangchris11/samurai See: https://github.com/yangchris11/samurai/blob/master/scripts/main_inference.py

haixuanTao avatar Mar 26 '25 18:03 haixuanTao

@dora-bot assign me

Choudhry18 avatar Mar 26 '25 18:03 Choudhry18

Hello @Choudhry18, this issue is now assigned to you!

github-actions[bot] avatar Mar 26 '25 18:03 github-actions[bot]

@haixuanTao do we need to implement a separate directory for dora-samurai in node-hub

Krishnadubey1008 avatar Mar 27 '25 04:03 Krishnadubey1008

@Krishnadubey1008 I think the goal is to create a new node in the node hub that implements dora-samurai, the way there is for dora-sam2. However I am working on the issue and almost done I will probably submit the PR later today, you wanna work on something else?

Choudhry18 avatar Mar 27 '25 13:03 Choudhry18

@haixuanTao we are using dora-sam2 in the reach2 demo atm, does the scope of this issue include replacing that with dora-samurai or that is of the scope of this issue.

Choudhry18 avatar Mar 27 '25 13:03 Choudhry18

@haixuanTao Apologies for the repeated pings. I’ve finished implementing the node and wanted to test it before submitting the PR.

From my understanding, Samurai relies on an initial bounding box, points, or previous masks for visual tracking. I’m working on an example with a fixed initial bounding box, but I’d like your input on the best approach for determining the initial box for the dora-samurai node when an initial bounding box is not provided.

Choudhry18 avatar Mar 28 '25 12:03 Choudhry18

Yeah absolutely. I think that in case there is no bounding box, we should not do anything.

haixuanTao avatar Mar 28 '25 12:03 haixuanTao

@haixuanTao Thank you for getting back. I had a couple more questions:

  1. I have successfully implemented dora-samurai to work with a provided video file or folder of frames. However, dora-sam2 processes a real-time stream of frames and returns masks for each frame. Should dora-samurai also support real-time video inference in a similar way? Currently the samurai implementation doesn't support real-time video inference - see ,but I can try tweaking the package to work if that is our use case.

  2. I was unable to find a pypi package for samurai unlike the one for sam2. https://github.com/dora-rs/dora/blob/4d6fabb59b0f0e8ea926210e8c8d17633ab699e1/node-hub/dora-sam2/pyproject.toml#L14 Adding samurai as a git submodule doesn't work reliability either (they are using hydra and it causes issues with paths when package is used from outside). My workaround was creating a fork of the samurai project and making it a package, I wanted to ask if it is better to keep at as a local package package inside dora-samurai node or publishing it to pypi and use it from there.

Choudhry18 avatar Mar 31 '25 09:03 Choudhry18

Hey @Choudhry18 are you still working on this issue... If not I can look at this issue 🙂️

ShashwatPatil avatar Apr 04 '25 19:04 ShashwatPatil

@ShashwatPatil I am still working on it, had some delay with real time video inference on samurai but I am almost done now :)

Choudhry18 avatar Apr 04 '25 21:04 Choudhry18

@Choudhry18 has been automatically unassigned from this stale issue after 2 weeks of inactivity.

github-actions[bot] avatar Apr 19 '25 00:04 github-actions[bot]

Is real time video inference on samurai supported now?

ShengkaiWu avatar Sep 12 '25 02:09 ShengkaiWu

@ShengkaiWu The samurai team hasn't released an official implementation for real time video inference but I was working on my own implementation you can find it here if it helps. There is a propagate_streaming that works the same way like the propagate_video method but instead does inference frame by frame and clears unused frames periodically to prevent memory leaks.

Choudhry18 avatar Sep 12 '25 02:09 Choudhry18