pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT)...

Results 196 pytorch-image-models issues
Sort by recently updated
recently updated
newest added

When instantiating `torchvision`'s `ImageNet` or `ImageFolder`, the `download` argument is passed, even if these two classes do not take this argument. This PR removes the argument from the `torch_kwargs` dict...

Hi, we are a group of engineers from Bytedance Inc. This year, our team published the work: "Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios"(https://arxiv.org/abs/2207.05501) (https://github.com/bytedance/Next-ViT)....

Hi, I'm trying to use `swin_base_patch4_window12_384` model to extract features, but I meet some error as follows: ```python model = timm.create_model("swin_base_patch4_window12_384", features_only=True, pretrained=False) ``` ```bash AttributeError: 'SwinTransformer' object has no...

bug

The behavior is not obvious to me. Perhaps it's useful to mention this here to avoid confusion.

Do we have plan to support MobileViT v3? MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features https://arxiv.org/abs/2209.15159 https://github.com/micronDLA/MobileViTv3 ![圖片](https://user-images.githubusercontent.com/23719775/212450286-bfdddeba-6795-4835-8b99-68af0f918ceb.png)

enhancement

Adapt the DaViT model from https://arxiv.org/abs/2204.03645 and https://github.com/dingmyu/davit. Notably, the model performs on par with many new models, such as MaxViT, whilst having higher throughput, a design that should allow...

I think there is a `split` function in the reading of the model name that doesn't read the full name of the model. ## To Reproduce ```python import timm model...

bug

Since FocalNet and Swin related (and both need refactoring for better feat extraction support), combining * Introduction of FocalNet arch * Refactor Swin V1/V2, possibly other similar arch that could...

What batch size number other than batch size of 1024 have you tried when training a DeiT or ViT model? In the paper, DeiT (https://arxiv.org/abs/2012.12877), they used a batch size...

Hello, Vision transformers in timm currently use a custom implementation of attention instead of `nn.MultiheadAttention`. Pytorch 2.0 will come with [flash attention](https://arxiv.org/abs/2205.14135) which is an exact implementation of attention, but...

enhancement