pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT)...

Results 237 pytorch-image-models issues
Sort by recently updated
recently updated
newest added

Paper: - [InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions](https://arxiv.org/pdf/2211.05778) - [DCNv4](https://arxiv.org/pdf/2401.06197) Adapted from official impl at https://github.com/OpenGVLab/DCNv4 Some clarifications: - FlashInternImage is the InternImage model that uses DCNv4...

Add the ViTamin model, which is trained on public DataComp-1B using OpenCLIP framework and obtains 82.9% zero-shot ImageNet-1K accuracy with 436M parameters. It achieves the state-of-the-art performance on zero-shot image...

https://github.com/NVlabs/RADIO The code and model weights of paper *[CVPR 2024] AM-RADIO: Agglomerative Vision Foundation Model - Reduce All Domains Into One* has been released by Nvidia > RADIO , a...

enhancement

CvT as described in https://arxiv.org/abs/2103.15808 Swin-era heirarchical transformer. From-scratch reimplementation, cleaner than original that exposes most module cfgs as kwargs, uses sdpa/timm style (https://github.com/microsoft/CvT/tree/main). WIP/barebones test for now, stuck at...

Intel Gaudi & GPU Max come with their own dist backend (hccl, ccl respectively). This patch enable those GPUs to be used in parallel to speed up training

Hello, mobilenetv4 paper is released~~ Is there any plan to add mobilenetv4 into this repo? https://arxiv.org/pdf/2404.10518

enhancement

**Describe the bug** When I do reparameterize the model NextViT for onnx export, it returns the error: `self.norm(x). None Object is not callable.` I believe line 200 of file https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/nextvit.py...

bug

## Motivation Chaining **un**pooled output to classifier has been [implemented](https://huggingface.co/docs/timm/feature_extraction#chaining-unpooled-output-to-classifier) and can be done as follows: ```python model = timm.create_model('vit_medium_patch16_reg1_gap_256', pretrained=True) output = model.forward_features(torch.randn(2,3,256,256)) classified = model.forward_head(output) ``` Compared the...

enhancement