Add support for dora-samurai for visual tracking of masks using Samurai!
The Segment Anything Model 2 (SAM 2) ( dora-sam2) has demonstrated strong performance in object segmentation tasks but faces challenges in visual object tracking, particularly when managing crowded scenes with fast-moving or self-occluding objects.
dora could greatly benefit from using dora-samurai instead of dora sam2 in order to track masked object in motion.
Reference: https://yangchris11.github.io/samurai/ Github: https://github.com/yangchris11/samurai See: https://github.com/yangchris11/samurai/blob/master/scripts/main_inference.py
@dora-bot assign me
Hello @Choudhry18, this issue is now assigned to you!
@haixuanTao do we need to implement a separate directory for dora-samurai in node-hub
@Krishnadubey1008 I think the goal is to create a new node in the node hub that implements dora-samurai, the way there is for dora-sam2. However I am working on the issue and almost done I will probably submit the PR later today, you wanna work on something else?
@haixuanTao we are using dora-sam2 in the reach2 demo atm, does the scope of this issue include replacing that with dora-samurai or that is of the scope of this issue.
@haixuanTao Apologies for the repeated pings. I’ve finished implementing the node and wanted to test it before submitting the PR.
From my understanding, Samurai relies on an initial bounding box, points, or previous masks for visual tracking. I’m working on an example with a fixed initial bounding box, but I’d like your input on the best approach for determining the initial box for the dora-samurai node when an initial bounding box is not provided.
Yeah absolutely. I think that in case there is no bounding box, we should not do anything.
@haixuanTao Thank you for getting back. I had a couple more questions:
-
I have successfully implemented dora-samurai to work with a provided video file or folder of frames. However, dora-sam2 processes a real-time stream of frames and returns masks for each frame. Should dora-samurai also support real-time video inference in a similar way? Currently the samurai implementation doesn't support real-time video inference - see ,but I can try tweaking the package to work if that is our use case.
-
I was unable to find a pypi package for samurai unlike the one for sam2. https://github.com/dora-rs/dora/blob/4d6fabb59b0f0e8ea926210e8c8d17633ab699e1/node-hub/dora-sam2/pyproject.toml#L14 Adding samurai as a git submodule doesn't work reliability either (they are using hydra and it causes issues with paths when package is used from outside). My workaround was creating a fork of the samurai project and making it a package, I wanted to ask if it is better to keep at as a local package package inside dora-samurai node or publishing it to pypi and use it from there.
Hey @Choudhry18 are you still working on this issue... If not I can look at this issue 🙂️
@ShashwatPatil I am still working on it, had some delay with real time video inference on samurai but I am almost done now :)
@Choudhry18 has been automatically unassigned from this stale issue after 2 weeks of inactivity.
Is real time video inference on samurai supported now?
@ShengkaiWu The samurai team hasn't released an official implementation for real time video inference but I was working on my own implementation you can find it here if it helps. There is a propagate_streaming that works the same way like the propagate_video method but instead does inference frame by frame and clears unused frames periodically to prevent memory leaks.