pytorch-image-models
pytorch-image-models copied to clipboard
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT)...
**Describe the bug** When `timm.create_model` is called with the `features_only=True` argument, it returns a `FeatureListNet` module. This module cannot be correctly wrapped by `torch.distributed.fsdp.FullyShardedDataParallel` when using the `FULL_SHARD` strategy. ```BASH...
**Is your feature request related to a problem? Please describe.** Currently, many models rely on a standard multi-head self-attention operator. Timm currently allows the user choose between 2 versions, an...
DPT decoder is widely used for various models. Could you support DPT decoder model?
Sapiens is a visual foundation model designed for human-centric tasks, similar to the DINO family of models. Unlike DINO, however, Sapiens was trained without intentionally blurring human faces. Given the...
This PR extends the validation metrics functionality (precision, recall, F1-score) to the `train.py` script. ### Changes: - The `validate` function within `train.py` now supports the `--metrics-avg` flag. - Implemented `torch.distributed.all_gather`...
This feature request is related to challenges in Masked Image Modeling (MIM) pre-training using vision transformer models in `timm`. Currently, embedding and feature extraction are tightly coupled within `forward_features`, making...
# Updates Per further discussion, the difference is intentional, but undocumented. It is a difference with the reference implementation from Google Big Vision. --- # Original Report Fix location: https://github.com/huggingface/pytorch-image-models/blob/a7c5368ba0c8713dc1c9a98cc83bf46ddd02b0a0/timm/models/naflexvit.py#L1767...