vision
vision copied to clipboard
MaxVit model
This PR is w.r.t. Batteries Phase 3 proposal to add the MaxVit architecture. It is still a work in progress as it has yet to be trained.
One caveat w.r.t. the way we would be exposing this model API to users is that the architecture is bound to the specific input size it was trained one (due to the usage of relative positional encodings)
Running the command: torchrun --nproc_per_node=1 train.py --test-only --prototype --weights MaxVit_T_Weights.IMAGENET1K_V1 --model maxvit_t -b 1
yields the following results:
Test: Acc@1 83.700 Acc@5 96.722
@TeodorPoncu It seems that in a recent commit, you accidentally updated all the expected files for all models. Could you please revert that?
@datumbox Sorry about that, everything should be fine now.
@TeodorPoncu It seems that in a recent commit, you accidentally updated all the expected files for all models. Could you please revert that?
Related discussion and pointers on generalizing fixed resolution for Swin: https://github.com/pytorch/vision/issues/6227
Also, I wonder if more relative-attention related modules can be reused from Swin
Running the deployed weights with the following command:
torchrun --nproc_per_node=1 train.py --model maxvit_t --interpolation bicubic --batch-size 1 --test-only --weights MaxVit_T_Weights.IMAGENET1K_V1
Yields the following results:
Test: Acc@1 83.700 Acc@5 96.722
Two more requests:
- Could you please upload the weights on manifold (see internal guide)
- Could you update the PR description to show-case the output accuracy of the following command?
torchrun --nproc_per_node=1 train.py --test-only --prototype --weights MaxVit_T_Weights.IMAGENET1K_V1 --model maxvit_t -b 1
Hey @TeodorPoncu!
You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py