What does this PR do?

https://github.com/huggingface/transformers/issues/32308

As stated in this issue this PR is making SAM2 compatible to transformers

cc. @zinccat @RUFFY-369

Fixes # (issue)

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you read the contributor guideline, Pull Request section?
[ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Jul 30 '24 08:07 SangbumChoi

closed due to https://github.com/huggingface/transformers/pull/32394

Aug 03 '24 01:08 SangbumChoi

@amyeroberts Can we continue this work on SAM2 as I think the authors' contributions have stopped. cc @SangbumChoi @NielsRogge

Sep 22 '24 19:09 RUFFY-369

@RUFFY-369 I have reopened this PR let's get started from this branch since I have updated!

Sep 22 '24 22:09 SangbumChoi

@RUFFY-369 I have reopened this PR let's get started from this branch since I have updated!

Lets go :100:

Sep 23 '24 07:09 RUFFY-369

@RUFFY-369 @SangbumChoi Excited to see this being picked up and the SAM-2 efforts revived! As there hasn't been any recent activity on #32394, we'll treat this as the active PR which will likely be merged in.

Let us know if you need any help or have any questions getting this into the library!

Sep 27 '24 09:09 amyeroberts

Thank you very much for working on this! Can't wait to try this out

Sep 27 '24 14:09 giswqs

@RUFFY-369 @SangbumChoi Excited to see this being picked up and the SAM-2 efforts revived! As there hasn't been any recent activity on #32394, we'll treat this as the active PR which will likely be merged in.

Let us know if you need any help or have any questions getting this into the library!

:100: :rocket:

Sep 27 '24 21:09 RUFFY-369

Are we targeting 2 or 2.1?

Oct 06 '24 13:10 bhack

Afaik there is no architecture difference in 2.1 so we can aim both but 2.1 for first priority

Oct 06 '24 13:10 SangbumChoi

@qubvel Hi Pavel, currently I'm finished with image + video checkpoint conversion and proceeding video inference pipeline.

For the image example I used to import with

img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/sam-car.png"

Is there any appropriate place to upload short video clip to use in docs and also conversion check?

Dec 03 '24 09:12 SangbumChoi

Hi @SangbumChoi, a repo on the hub should be OK, alternatively, you can make a dataset. Then I can copy it to hf-internal-testing org. Let me know if you face any issues 🤗

Dec 03 '24 10:12 qubvel

Hello everyone! I wanted to know if this is an active PR and if there is any way I can help so that this gets deployed in transformers? Please let me know :)

Jan 15 '25 20:01 neurohazardous

@neurohazardous Hi, It is an active PR, and already image based inference is done in this PR. What is left is video based inference. If you are familiar with transformers already then we can discuss about the collaboration!

Jan 15 '25 23:01 SangbumChoi

Hi,

I really appreciate all the work within this PR and with all the other SAM2 attemps in other PRs, but I am wondering: is it really a good goal to commit everything at once? I mean there could be a phase 1, finishing, testing, merging SAM2's image based features, and continue in a new PR from there. It makes review easier, people get partial results sooner, etc., and when things needs to be rebased, it is less files, etc. Given SAM2's improved speed, etc., an image-only model would be already a great addition. What do you think?

Jan 20 '25 12:01 deepconvai

@qubvel Do you have any additional comment for upper comment suggestion?

Jan 21 '25 11:01 SangbumChoi

I think it might be a good option to deliver what we have as soon as possible. However, we must ensure that the API for the Image model will not be broken with the addition of the Video model, so it might be a bit tricky, as I'm not sure how tightly both of them are coupled.

Jan 21 '25 12:01 qubvel

Yeah I also agree that it will be tricky and prefer to make both enabled.

Jan 21 '25 12:01 SangbumChoi

Folks would greatly appreciate image support even if video isn’t there yet, for what it’s worth.

Jan 26 '25 05:01 hipsterusername

@SangbumChoi @qubvel @RUFFY-369 thanks for your work on this. Do you have some guesstimated time frame on when this PR may be usable for video predictions.

I was thinking to try to get something going with onnx runtime, but using transformers js with WebGPU would be much better ;)

Feb 19 '25 11:02 hlevring

Since there has been no recent activity in this PR, I suppose anyone from the community can reopen the PR and continue working on it. In case @SangbumChoi is fine with it, let's wait for his confirmation.

Feb 19 '25 11:02 qubvel

I’d propose that what is ready for image segmentation gets shipped if active work isn’t being done on video.

Feb 19 '25 12:02 hipsterusername

I cannot assure the finish date but I will start this PR very soon when the ongoing other PR get merged.

Feb 19 '25 12:02 SangbumChoi

@qubvel

Hi Pavel, long time no see. Even though this PR is not perfectly ready. I have decided to make Image Part first. There are only few things left.

convert all model (only tiny atm)
docstring update
example code update

I know you are busy but you might want to roughly review this one :)

Mar 15 '25 14:03 SangbumChoi

Hi @SangbumChoi ! Thanks for the huge work on this. I just made a quick pass for now and left a few comments mainly on things that need to be updated to follow new Transformers convention (init, no TensorFlow, attention implementation...). Since this PR was opened quite a long time ago, do you still have some time to work on it? Otherwise we can try to figure out how to take it from here. Thanks again!

@yonigozlan Hi, I will try!

Mar 25 '25 04:03 SangbumChoi

@qubvel @yonigozlan Hi can you review this PR except for the https://github.com/huggingface/transformers/pull/32317#discussion_r2010457443 ?

Apr 23 '25 07:04 SangbumChoi

Hi @SangbumChoi, we will try to review this week!

Apr 29 '25 09:04 qubvel

Hi @SangbumChoi ! Sorry for the delay on this, I have started reviewing and will continue this week. I'm happy to help push updates to get up to speed with the current state of Transformers and to support the video pipeline if you don't have the bandwidth to iterate on this in the coming weeks. let me know!

May 20 '25 22:05 yonigozlan

@yonigozlan I am little bit busy until June 5th but I can do my best after this. Current model is already converted the weight including memory encoder which is related to video pipeline.

Let's do the video pipeline. :) I will summarize the TO DOs in this weekdays.

May 20 '25 22:05 SangbumChoi

@SangbumChoi Sounds good! I'll review + push some changes in the meantime then, let's see if I get to the video part by June 5 😅. Thanks again for your huge work, excited to merge this PR soon!

May 20 '25 22:05 yonigozlan

@yonigozlan Hi, I think I can work on CVPR week (maybe coding while at the hotel :) ), I will start to understand the change that you have made. Is there anything specific detail that I should aware?

Jun 04 '25 13:06 SangbumChoi

Add Segment Anything 2 (SAM2)

What does this PR do?

Before submitting

Who can review?