pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT)...

Results 237 pytorch-image-models issues
Sort by recently updated
recently updated
newest added

For Timm supporting third-party backend-NPU, here is a PR opened for compatible. Note: We can specify a config.yaml as the value of the ‘config’ variable to activate a third-party backend:...

A big WIP, pushing early to resolve masking stability issues with F.sdpa

Dear all, When trying to perform Quantization Aware Training (QAT), modules are being wrapped with a [QuantWrapper](https://pytorch.org/docs/stable/generated/torch.ao.quantization.QuantWrapper.html). But, because some models are implementing `qkv` with biases using `torch.nn.functional`, one has...

Currently, timm support different image size in testing time for ViT with absolute position encoding, and ViT with relative position encoding is not supported. However, these ones with relative position...

enhancement

Hi,I found a typographical error in train.py in line 628 where the ‘pipeiine‘’ should be ‘pipeline’ https://github.com/huggingface/pytorch-image-models/blob/b996c1a0f5068e7f5dfe69429e59e873536754c9/train.py#L628

Both are pyramid networks and can be used for multi-scale feature extraction, but to my knowledge do not support it like similar architectures such as PVT or Swin.

enhancement

I've trained Vision Transformer (ViT) models, small and large, with DINOv2 pretrained weights from [Facebook](https://github.com/facebookresearch/dinov2) (vit_small_patch14_reg4_dinov2.lvd142m) and timm (dinov2_vits14_reg_lc). The timm version underperforms, as seen in feature and attention map,...

bug

**Is your feature request related to a problem? Please describe.** Evaluating potential models is not only related to performance but also licensing e.g. can model be used commercially. Therefore, it...

enhancement

**Is your feature request related to a problem? Please describe.** I am building a library to automatically build any decoder for any timm encoder called [mmit](https://github.com/abcamiletto/mmit) Due to the need...

enhancement

Add Meta's ImageBind "ImageBind: One Embedding Space To Bind Them All" https://github.com/facebookresearch/ImageBind We would implement the embeddings for images modality.

enhancement