pytorch-image-models
pytorch-image-models copied to clipboard
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT)...
Add a vision model from Meta "Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles" https://github.com/facebookresearch/hiera/tree/main 
**Is your feature request related to a problem? Please describe.** Currently, the timm library lacks implementations for Variational Autoencoder (VAE) and Vector Quantized VAE (VQ-VAE) models. Users looking to utilize...
From-scratch impl of [DependencyViT](https://arxiv.org/abs/2304.03282). [Official impl](https://github.com/dingmyu/DependencyViT) not published. Not competitive with sota hierarchical models, advantage over isometric models is lost due to inability to use `F.sdpa`, but interesting mechanism. Mainly...
Update ML Decoder's `TransformerDecoderLayerOptimal` module to comply with what `nn.TransformerDecoder` expects. Current changes work with resnet50. `add_ml_decoder_head` needs to be updated for other models. In my limited testing, the following...
Segment Anything (SAM) uses the ViTDet backbone [[paper](https://arxiv.org/pdf/2203.16527.pdf)]. The only difference is that SAM adds a 3 layer neck. This neck can be disabled through arguments. By changing this 1...
Hi there! I am the author of the ICCV 2023 paper titled "Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?", which focuses on benchmarking pooling techniques for...
https://github.com/zhangyuanhan-ai/bamboo has released some interesting models, e.g., Bamboo-CLS ResNet-50 and Bamboo-CLS ViT B/16. The ViT model beats most base ViTs I have seen in challenging datasets such as [Objectnet](https://paperswithcode.com/sota/image-classification-on-objectnet).
**Is your feature request related to a problem? Please describe.** When running the training script, there is no way to stop training early when performance plateaus. This causes 2 problems:...
**Is your feature request related to a problem? Please describe.** In paper “[TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance](https://openaccess.thecvf.com/content/ICCV2023/html/Wu_TinyCLIP_CLIP_Distillation_via_Affinity_Mimicking_and_Weight_Inheritance_ICCV_2023_paper.html)” : from microsoft They provide a novel cross-modal distillation...
Hi there ! From [`CONTRIBUTING.md`](https://github.com/huggingface/pytorch-image-models/blob/ef72c3cd470dd67836eebf95ec567199c890a6a2/CONTRIBUTING.md): > Code linting and auto-format (black) are not currently in place but open to consideration I suggest using [ruff](https://github.com/astral-sh/ruff)'s code linter (and new formatter), like...