Add Segment Anything 2 (SAM2)
What does this PR do?
https://github.com/huggingface/transformers/issues/32308
As stated in this issue this PR is making SAM2 compatible to transformers
cc. @zinccat @RUFFY-369
Fixes # (issue)
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the contributor guideline, Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
closed due to https://github.com/huggingface/transformers/pull/32394
@amyeroberts Can we continue this work on SAM2 as I think the authors' contributions have stopped. cc @SangbumChoi @NielsRogge
@RUFFY-369 I have reopened this PR let's get started from this branch since I have updated!
@RUFFY-369 I have reopened this PR let's get started from this branch since I have updated!
Lets go :100:
@RUFFY-369 @SangbumChoi Excited to see this being picked up and the SAM-2 efforts revived! As there hasn't been any recent activity on #32394, we'll treat this as the active PR which will likely be merged in.
Let us know if you need any help or have any questions getting this into the library!
Thank you very much for working on this! Can't wait to try this out
@RUFFY-369 @SangbumChoi Excited to see this being picked up and the SAM-2 efforts revived! As there hasn't been any recent activity on #32394, we'll treat this as the active PR which will likely be merged in.
Let us know if you need any help or have any questions getting this into the library!
:100: :rocket:
Are we targeting 2 or 2.1?
Afaik there is no architecture difference in 2.1 so we can aim both but 2.1 for first priority
@qubvel Hi Pavel, currently I'm finished with image + video checkpoint conversion and proceeding video inference pipeline.
For the image example I used to import with
img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/sam-car.png"
Is there any appropriate place to upload short video clip to use in docs and also conversion check?
Hi @SangbumChoi, a repo on the hub should be OK, alternatively, you can make a dataset. Then I can copy it to hf-internal-testing org. Let me know if you face any issues 🤗
Hello everyone! I wanted to know if this is an active PR and if there is any way I can help so that this gets deployed in transformers? Please let me know :)
@neurohazardous Hi, It is an active PR, and already image based inference is done in this PR. What is left is video based inference. If you are familiar with transformers already then we can discuss about the collaboration!
Hi,
I really appreciate all the work within this PR and with all the other SAM2 attemps in other PRs, but I am wondering: is it really a good goal to commit everything at once? I mean there could be a phase 1, finishing, testing, merging SAM2's image based features, and continue in a new PR from there. It makes review easier, people get partial results sooner, etc., and when things needs to be rebased, it is less files, etc. Given SAM2's improved speed, etc., an image-only model would be already a great addition. What do you think?
@qubvel Do you have any additional comment for upper comment suggestion?
I think it might be a good option to deliver what we have as soon as possible. However, we must ensure that the API for the Image model will not be broken with the addition of the Video model, so it might be a bit tricky, as I'm not sure how tightly both of them are coupled.
Yeah I also agree that it will be tricky and prefer to make both enabled.
Folks would greatly appreciate image support even if video isn’t there yet, for what it’s worth.
@SangbumChoi @qubvel @RUFFY-369 thanks for your work on this. Do you have some guesstimated time frame on when this PR may be usable for video predictions.
I was thinking to try to get something going with onnx runtime, but using transformers js with WebGPU would be much better ;)
Since there has been no recent activity in this PR, I suppose anyone from the community can reopen the PR and continue working on it. In case @SangbumChoi is fine with it, let's wait for his confirmation.
I’d propose that what is ready for image segmentation gets shipped if active work isn’t being done on video.
I cannot assure the finish date but I will start this PR very soon when the ongoing other PR get merged.
@qubvel
Hi Pavel, long time no see. Even though this PR is not perfectly ready. I have decided to make Image Part first. There are only few things left.
- convert all model (only tiny atm)
- docstring update
- example code update
I know you are busy but you might want to roughly review this one :)
Hi @SangbumChoi ! Thanks for the huge work on this. I just made a quick pass for now and left a few comments mainly on things that need to be updated to follow new Transformers convention (init, no TensorFlow, attention implementation...). Since this PR was opened quite a long time ago, do you still have some time to work on it? Otherwise we can try to figure out how to take it from here. Thanks again!
@yonigozlan Hi, I will try!
@qubvel @yonigozlan Hi can you review this PR except for the https://github.com/huggingface/transformers/pull/32317#discussion_r2010457443 ?
Hi @SangbumChoi, we will try to review this week!
Hi @SangbumChoi ! Sorry for the delay on this, I have started reviewing and will continue this week. I'm happy to help push updates to get up to speed with the current state of Transformers and to support the video pipeline if you don't have the bandwidth to iterate on this in the coming weeks. let me know!
@yonigozlan I am little bit busy until June 5th but I can do my best after this. Current model is already converted the weight including memory encoder which is related to video pipeline.
Let's do the video pipeline. :) I will summarize the TO DOs in this weekdays.
@SangbumChoi Sounds good! I'll review + push some changes in the meantime then, let's see if I get to the video part by June 5 😅. Thanks again for your huge work, excited to merge this PR soon!
@yonigozlan Hi, I think I can work on CVPR week (maybe coding while at the hotel :) ), I will start to understand the change that you have made. Is there anything specific detail that I should aware?