pytorch-image-models Add ViTamin models

Add the ViTamin model, which is trained on public DataComp-1B using OpenCLIP framework and obtains 82.9% zero-shot ImageNet-1K accuracy with 436M parameters. It achieves the state-of-the-art performance on zero-shot image classification, multi-modal retrieval, open-vocabulary detection and segmentation, and large multi-model models.

The code of ViTamin models are modified from vision_transformer_hybrid.py in the timm codebase.

This ViTamin work has been accepted to CVPR 2024 (https://arxiv.org/pdf/2404.02132).

May 05 '24 06:05 Beckschen

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

May 05 '24 15:05 HuggingFaceDocBuilderDev

@Beckschen thanks, probably a few more changes before the tests pass, if you get stuck I can help in a few days, for starter current failure, the dataclass init needs to use the default factory pattern as here: https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/maxxvit.py#L137`

May 05 '24 16:05 rwightman

Thanks very much, Ross @rwightman ! I've fixed the issue with the dataclass initialization. Could you please review it before proceeding with the merge? Thanks again!

May 14 '24 19:05 Beckschen

@Beckschen this required more changes so I've continued in another PR #2193 (which pulls these commits and adds my own), including an addition to the base vit model for xlarge (disable pos embed). I think it's working now but haven't done extensive checks... can add support to OpenCLIP now fairly easily, easier to verify it's correct there.

Jun 04 '24 00:06 rwightman

I'm truly grateful for your help, @rwightman ! I saw there are changes regarding the compatibility with vision_transformer.py and vision_transformer_hybrid.py . Thanks again!

The version is designed to support both timm and OpenCLIP. Thanks for merging the model configs in OpenCLIP.

Thanks again, @rwightman !

Best regards, Jieneng

Jun 07 '24 20:06 Beckschen

pytorch-image-models pytorch-image-models copied to clipboard

Add ViTamin models

pytorch-image-models
pytorch-image-models copied to clipboard