transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add support for fine-tuning CLIP-like models using contrastive-image-text example

Open tjs-intel opened this issue 4 months ago • 4 comments

What does this PR do?

The example contrastive-image-text works for fine-tuning models that have the model_type "clip", but for other models like "chinese_clip" and "siglip" the VisionTextDualEncoderConfig class is too specific to CLIP models.

This PR adds support for Chinese-CLIP and SigLIP vision models to be fine-tuned with the contrastive-image-text example.

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

@amyeroberts @patil-suraj @patrickvonplaten

tjs-intel avatar Feb 16 '24 22:02 tjs-intel

Fixing up this PR as per the contributor guidelines now

tjs-intel avatar Feb 16 '24 23:02 tjs-intel

Happy to receive suggestions for any test candidates

tjs-intel avatar Feb 16 '24 23:02 tjs-intel

This has been manually tested by replacing openai/clip-vit-base-patch32 in the contrastive-image-text example with the following models:

	OFA-Sys/chinese-clip-vit-base-patch16
	facebook/metaclip-b32-400m
	google/siglip-so400m-patch14-384
	laion/CLIP-ViT-B-32-laion2B-s34B-b79K
	laion/CLIP-ViT-H-14-laion2B-s32B-b79K
	laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
	openai/clip-vit-base-patch32
	openai/clip-vit-large-patch14
	openai/clip-vit-large-patch14-336
	timm/ViT-SO400M-14-SigLIP-384

tjs-intel avatar Feb 16 '24 23:02 tjs-intel

Not sure what's going on here: https://app.circleci.com/pipelines/github/huggingface/transformers/84689/workflows/02d18e8c-af6e-465d-8625-fb3dc53bc03e/jobs/1095368/parallel-runs/0/steps/0-116 https://app.circleci.com/pipelines/github/huggingface/transformers/84689/workflows/02d18e8c-af6e-465d-8625-fb3dc53bc03e/jobs/1095369/parallel-runs/0/steps/0-115 https://app.circleci.com/pipelines/github/huggingface/transformers/84689/workflows/02d18e8c-af6e-465d-8625-fb3dc53bc03e/jobs/1095365/parallel-runs/0/steps/0-117

tjs-intel avatar Feb 17 '24 00:02 tjs-intel

Hi @tjs-intel, thanks for adding this! For the failing tests, could you try rebasing onto main? There was some recent issues we had with compatible library versions which should now be resolved

amyeroberts avatar Feb 19 '24 19:02 amyeroberts

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.