pytorch-image-models issues

Add FlashInternImage models

9

Paper: - [InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions](https://arxiv.org/pdf/2211.05778) - [DCNv4](https://arxiv.org/pdf/2401.06197) Adapted from official impl at https://github.com/OpenGVLab/DCNv4 Some clarifications: - FlashInternImage is the InternImage model that uses DCNv4...

IridescentPig

A few more features_intermediate() models, AttentionExtract helper, related minor cleanup.

1

rwightman

Add ViTamin models

3

Add the ViTamin model, which is trained on public DataComp-1B using OpenCLIP framework and obtains 82.9% zero-shot ImageNet-1K accuracy with 436M parameters. It achieves the state-of-the-art performance on zero-shot image...

Beckschen

[FEATURE] Add ViT weights: RADIO

3

https://github.com/NVlabs/RADIO The code and model weights of paper *[CVPR 2024] AM-RADIO: Agglomerative Vision Foundation Model - Reduce All Domains Into One* has been released by Nvidia > RADIO , a...

seefun

enhancement

Add Convolutional vision Transformer (CvT)

5

CvT as described in https://arxiv.org/abs/2103.15808 Swin-era heirarchical transformer. From-scratch reimplementation, cleaner than original that exposes most module cfgs as kwargs, uses sdpa/timm style (https://github.com/microsoft/CvT/tree/main). WIP/barebones test for now, stuck at...

fffffgggg54

Add backend arg

2

Intel Gaudi & GPU Max come with their own dist backend (hccl, ccl respectively). This patch enable those GPUs to be used in parallel to speed up training

Delaunay

Add the distributed backend for more devices

Delaunay

Add mobilenetv4

2

Hello, mobilenetv4 paper is released~~ Is there any plan to add mobilenetv4 into this repo? https://arxiv.org/pdf/2404.10518

wenhui-ml

enhancement

[BUG] Bug in NextVit reparameterize

1

**Describe the bug** When I do reparameterize the model NextViT for onnx export, it returns the error: `self.norm(x). None Object is not callable.` I believe line 200 of file https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/nextvit.py...

chuong98

bug

[FEATURE] Chaining pooled output to classifier

## Motivation Chaining **un**pooled output to classifier has been [implemented](https://huggingface.co/docs/timm/feature_extraction#chaining-unpooled-output-to-classifier) and can be done as follows: ```python model = timm.create_model('vit_medium_patch16_reg1_gap_256', pretrained=True) output = model.forward_features(torch.randn(2,3,256,256)) classified = model.forward_head(output) ``` Compared the...

ZeyuSun

enhancement

pytorch-image-models
pytorch-image-models copied to clipboard

Metadata

Add FlashInternImage models

A few more features_intermediate() models, AttentionExtract helper, related minor cleanup.

Add ViTamin models

[FEATURE] Add ViT weights: RADIO

Add Convolutional vision Transformer (CvT)

Add backend arg

Add the distributed backend for more devices

Add mobilenetv4

[BUG] Bug in NextVit reparameterize

[FEATURE] Chaining pooled output to classifier

← Metadata

Owner

Metadata

pytorch-image-models pytorch-image-models copied to clipboard

Metadata

← Metadata

Owner

Metadata

pytorch-image-models
pytorch-image-models copied to clipboard