transformers
transformers copied to clipboard
Adding imagebind
What does this PR do?
This PR fixes https://github.com/huggingface/transformers/issues/23240 by adding ImageBind
model.
This is based on https://github.com/huggingface/transformers/pull/26310 which is currently stale and the author said it would not have time to work on it (though welcome to help @dg845 ).
Taking into consideration the points raised by @dg845 here https://github.com/huggingface/transformers/pull/26310#issuecomment-1907231272 I'll focus on adding the text/image/audio portion and try to contact the authors.
Who can Review
@amyeroberts (?)
I'll ping possible reviewers here as soon as it is ready for review
@amyeroberts I think we're good for a first review!
There are a few points that we should have in mind:
- I have converted only text, vision, and audio modalities so far as those are the only modalities that have their preprocess steps in the original repo. There are a few issues in the original repo from people mentioning how they've managed the preprocessing of depth and thermal so if we want to we can add these as well.
- I'm not sure how we should deal with videos as they fall in the vision modality and thus should be passed as
pixel_values
as well. Preprocessing is equal to the image version + a frame selection logic. Should we have avideos
arg in theImageBindImageProcessor
(currently this processor is just a copy from CLIP) - In the case we add thermal and depth as well following the issues in the original repo should these be passed in the
ImageBindImageProcessor
as well? - When pushing
ImageBindProcessor
to the hub I've noticed that the args forImageBindImageProcessor
andImageBindFeatureExtractor
are within the same filepreprocessor_config.json
which is not ideal
c.c. @amyeroberts
Maybe @NielsRogge could do a first check?
c.c. @amyeroberts
@molbap Could you give a first review, if you have the time?
Haven't had much time lately to work on this, but I should be able to work during this week.
I'm still not completely sure how to deal with the ImageBindVideoProcessor
as there's no precedent in the library where we have the option to pass both images and videos. Could we have this class implemented in the image_processing_imagebind.py
? (For the ImageBindImageProcessor
we can just use CLIPImageProcessor
.
Also, since ImageBind
has both image and audio preprocessor and a few arguments with the same name they are not distinguished in the preprocessor_config.json
c.c. @amyeroberts @molbap
@EduardoPach
Regarding image and video imputs, there's video-llava which can accept both images and videos.
So in this case, we could have an ImageBindImageProcessor which takes both images and videos.
Also, since ImageBind has both image and audio preprocessor and a few arguments with the same name they are not distinguished in the preprocessor_config.json
Just for completeness, what arguments are overlapping? This might be the time to start splitting the preprocessor_config.json in to different configs for the different processing classes e.g. image_processor_config.json
etc. cc @ydshieh @molbap
Coming back this week!
@EduardoPach
Regarding image and video imputs, there's video-llava which can accept both images and videos.
So in this case, we could have an ImageBindImageProcessor which takes both images and videos.
Also, since ImageBind has both image and audio preprocessor and a few arguments with the same name they are not distinguished in the preprocessor_config.json
Just for completeness, what arguments are overlapping? This might be the time to start splitting the preprocessor_config.json in to different configs for the different processing classes e.g.
image_processor_config.json
etc. cc @ydshieh @molbap
I think it's only the do_normalize
Getting back at this
LGTM
@amyeroberts, could you review this PR?
πͺ
@amyeroberts or @qubvel could any of you take a look on this? The failing tests are unrelated
c.c. @RUFFY-369
Hey! π€ Thanks for your contribution to the transformers
library!
Before merging this pull request, slow tests CI should be triggered. To enable this:
- Add the
run-slow
label to the PR - When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command
[run-slow]
followed by a comma separated list of all the models to be tested, i.e.[run_slow] model_to_test_1, model_to_test_2
- If the pull request affects a lot of models, put at most 10 models in the commit message
- A
transformers
maintainer will then approve the workflow to start the tests
(For maintainers) The documentation for slow tests CI on PRs is here.
@molbap I have addressed all the changes and left some questions as well Thank you :smile:
All tests are green
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Thanks @RUFFY-369 ! Reviewing this afternoon :hugs:
Thanks @RUFFY-369 ! Reviewing this afternoon π€
Sureπ€
Hi @molbap , I have addressed all the final pass checks as well and left one two questions. If you can please have a look when you get the time.
Final pass I think before passing it to @ArthurZucker (pinging so it's on his radar). Added some comments on qkv biases that are nonstandard a a few other things, overall looks really good! I did run slow tests locally, left a comment on one but all seems inline.
I think if you agree with the changes done then cc @ArthurZucker can take on with the review round.
Thank you
All tests are green