pytorch-image-models issues

Fix issue with `torchvision`'s `ImageNet`

When instantiating `torchvision`'s `ImageNet` or `ImageFolder`, the `download` argument is passed, even if these two classes do not take this argument. This PR removes the argument from the `torch_kwargs` dict...

dedeswim

Add Next-ViT

Hi, we are a group of engineers from Bytedance Inc. This year, our team published the work: "Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios"(https://arxiv.org/abs/2207.05501) (https://github.com/bytedance/Next-ViT)....

XiaXin-Aloys

[BUG] Fail to turn swin_base_patch4_window12_384 into feature extractor

1

Hi, I'm trying to use `swin_base_patch4_window12_384` model to extract features, but I meet some error as follows: ```python model = timm.create_model("swin_base_patch4_window12_384", features_only=True, pretrained=False) ``` ```bash AttributeError: 'SwinTransformer' object has no...

rentainhe

bug

Added training script note

The behavior is not obvious to me. Perhaps it's useful to mention this here to avoid confusion.

F5D7

[FEATURE] Add MobileViT v3

Do we have plan to support MobileViT v3? MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features https://arxiv.org/abs/2209.15159 https://github.com/micronDLA/MobileViTv3 ![圖片](https://user-images.githubusercontent.com/23719775/212450286-bfdddeba-6795-4835-8b99-68af0f918ceb.png)

Amadeus-AI

enhancement

Add DaViT

12

Adapt the DaViT model from https://arxiv.org/abs/2204.03645 and https://github.com/dingmyu/davit. Notably, the model performs on par with many new models, such as MaxViT, whilst having higher throughput, a design that should allow...

fffffgggg54

[BUG] Not able to load `eva_giant_patch14_560.m30m_ft_in22k_in1k` model

1

I think there is a `split` function in the reading of the model name that doesn't read the full name of the model. ## To Reproduce ```python import timm model...

MohamedAliRashad

bug

Add FocalNet arch, refactor Swin V1/V2 for better feature extraction and HF hub multi-weight support

1

Since FocalNet and Swin related (and both need refactoring for better feat extraction support), combining * Introduction of FocalNet arch * Refactor Swin V1/V2, possibly other similar arch that could...

rwightman

What batch size number other than 1024 have you tried when training a DeiT model?

What batch size number other than batch size of 1024 have you tried when training a DeiT or ViT model? In the paper, DeiT (https://arxiv.org/abs/2012.12877), they used a batch size...

Phuoc-Hoan-Le

Flash attention

Hello, Vision transformers in timm currently use a custom implementation of attention instead of `nn.MultiheadAttention`. Pytorch 2.0 will come with [flash attention](https://arxiv.org/abs/2205.14135) which is an exact implementation of attention, but...

SimJeg

enhancement

pytorch-image-models
pytorch-image-models copied to clipboard

Metadata

Fix issue with `torchvision`'s `ImageNet`

Add Next-ViT

[BUG] Fail to turn swin_base_patch4_window12_384 into feature extractor

Added training script note

[FEATURE] Add MobileViT v3

Add DaViT

[BUG] Not able to load `eva_giant_patch14_560.m30m_ft_in22k_in1k` model

Add FocalNet arch, refactor Swin V1/V2 for better feature extraction and HF hub multi-weight support

What batch size number other than 1024 have you tried when training a DeiT model?

Flash attention

← Metadata

Owner

Metadata

pytorch-image-models pytorch-image-models copied to clipboard

Metadata

← Metadata

Owner

Metadata

pytorch-image-models
pytorch-image-models copied to clipboard