transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Adding imagebind

Open EduardoPach opened this issue 9 months ago β€’ 11 comments

What does this PR do?

This PR fixes https://github.com/huggingface/transformers/issues/23240 by adding ImageBind model.

This is based on https://github.com/huggingface/transformers/pull/26310 which is currently stale and the author said it would not have time to work on it (though welcome to help @dg845 ).

Taking into consideration the points raised by @dg845 here https://github.com/huggingface/transformers/pull/26310#issuecomment-1907231272 I'll focus on adding the text/image/audio portion and try to contact the authors.

Who can Review

@amyeroberts (?)

EduardoPach avatar May 07 '24 09:05 EduardoPach

I'll ping possible reviewers here as soon as it is ready for review

EduardoPach avatar May 07 '24 09:05 EduardoPach

@amyeroberts I think we're good for a first review!

There are a few points that we should have in mind:

  1. I have converted only text, vision, and audio modalities so far as those are the only modalities that have their preprocess steps in the original repo. There are a few issues in the original repo from people mentioning how they've managed the preprocessing of depth and thermal so if we want to we can add these as well.
  2. I'm not sure how we should deal with videos as they fall in the vision modality and thus should be passed as pixel_values as well. Preprocessing is equal to the image version + a frame selection logic. Should we have a videos arg in the ImageBindImageProcessor (currently this processor is just a copy from CLIP)
  3. In the case we add thermal and depth as well following the issues in the original repo should these be passed in the ImageBindImageProcessor as well?
  4. When pushing ImageBindProcessor to the hub I've noticed that the args for ImageBindImageProcessor and ImageBindFeatureExtractor are within the same file preprocessor_config.json which is not ideal

EduardoPach avatar May 23 '24 16:05 EduardoPach

c.c. @amyeroberts

EduardoPach avatar May 28 '24 13:05 EduardoPach

Maybe @NielsRogge could do a first check?

EduardoPach avatar May 31 '24 14:05 EduardoPach

c.c. @amyeroberts

EduardoPach avatar Jun 11 '24 13:06 EduardoPach

@molbap Could you give a first review, if you have the time?

amyeroberts avatar Jun 13 '24 11:06 amyeroberts

Haven't had much time lately to work on this, but I should be able to work during this week.

I'm still not completely sure how to deal with the ImageBindVideoProcessor as there's no precedent in the library where we have the option to pass both images and videos. Could we have this class implemented in the image_processing_imagebind.py? (For the ImageBindImageProcessor we can just use CLIPImageProcessor.

Also, since ImageBind has both image and audio preprocessor and a few arguments with the same name they are not distinguished in the preprocessor_config.json

c.c. @amyeroberts @molbap

EduardoPach avatar Jul 07 '24 13:07 EduardoPach

@EduardoPach

Regarding image and video imputs, there's video-llava which can accept both images and videos.

So in this case, we could have an ImageBindImageProcessor which takes both images and videos.

Also, since ImageBind has both image and audio preprocessor and a few arguments with the same name they are not distinguished in the preprocessor_config.json

Just for completeness, what arguments are overlapping? This might be the time to start splitting the preprocessor_config.json in to different configs for the different processing classes e.g. image_processor_config.json etc. cc @ydshieh @molbap

amyeroberts avatar Jul 09 '24 09:07 amyeroberts

Coming back this week!

EduardoPach avatar Jul 22 '24 17:07 EduardoPach

@EduardoPach

Regarding image and video imputs, there's video-llava which can accept both images and videos.

So in this case, we could have an ImageBindImageProcessor which takes both images and videos.

Also, since ImageBind has both image and audio preprocessor and a few arguments with the same name they are not distinguished in the preprocessor_config.json

Just for completeness, what arguments are overlapping? This might be the time to start splitting the preprocessor_config.json in to different configs for the different processing classes e.g. image_processor_config.json etc. cc @ydshieh @molbap

I think it's only the do_normalize

EduardoPach avatar Jul 23 '24 11:07 EduardoPach

Getting back at this

EduardoPach avatar Aug 27 '24 18:08 EduardoPach

LGTM

RUFFY-369 avatar Sep 01 '24 19:09 RUFFY-369

@amyeroberts, could you review this PR?

EduardoPach avatar Sep 02 '24 07:09 EduardoPach

πŸ˜ͺ

EduardoPach avatar Sep 13 '24 09:09 EduardoPach

@amyeroberts or @qubvel could any of you take a look on this? The failing tests are unrelated

EduardoPach avatar Sep 15 '24 10:09 EduardoPach

c.c. @RUFFY-369

EduardoPach avatar Sep 27 '24 14:09 EduardoPach

Hey! πŸ€— Thanks for your contribution to the transformers library!

Before merging this pull request, slow tests CI should be triggered. To enable this:

  • Add the run-slow label to the PR
  • When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command [run-slow] followed by a comma separated list of all the models to be tested, i.e. [run_slow] model_to_test_1, model_to_test_2
    • If the pull request affects a lot of models, put at most 10 models in the commit message
  • A transformers maintainer will then approve the workflow to start the tests

(For maintainers) The documentation for slow tests CI on PRs is here.

@molbap I have addressed all the changes and left some questions as well Thank you :smile:

RUFFY-369 avatar Oct 02 '24 16:10 RUFFY-369

All tests are green

RUFFY-369 avatar Oct 05 '24 14:10 RUFFY-369

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Thanks @RUFFY-369 ! Reviewing this afternoon :hugs:

molbap avatar Oct 07 '24 11:10 molbap

Thanks @RUFFY-369 ! Reviewing this afternoon πŸ€—

SureπŸ€—

RUFFY-369 avatar Oct 07 '24 16:10 RUFFY-369

Hi @molbap , I have addressed all the final pass checks as well and left one two questions. If you can please have a look when you get the time.

Final pass I think before passing it to @ArthurZucker (pinging so it's on his radar). Added some comments on qkv biases that are nonstandard a a few other things, overall looks really good! I did run slow tests locally, left a comment on one but all seems inline.

I think if you agree with the changes done then cc @ArthurZucker can take on with the review round.

Thank you

RUFFY-369 avatar Oct 09 '24 12:10 RUFFY-369

All tests are green

RUFFY-369 avatar Oct 09 '24 12:10 RUFFY-369