vision MaxVit model

This PR is w.r.t. Batteries Phase 3 proposal to add the MaxVit architecture. It is still a work in progress as it has yet to be trained.

One caveat w.r.t. the way we would be exposing this model API to users is that the architecture is bound to the specific input size it was trained one (due to the usage of relative positional encodings)

Running the command: torchrun --nproc_per_node=1 train.py --test-only --prototype --weights MaxVit_T_Weights.IMAGENET1K_V1 --model maxvit_t -b 1 yields the following results:

Test: Acc@1 83.700 Acc@5 96.722

Aug 01 '22 14:08 TeodorPoncu

@TeodorPoncu It seems that in a recent commit, you accidentally updated all the expected files for all models. Could you please revert that?

Aug 05 '22 18:08 datumbox

@datumbox Sorry about that, everything should be fine now.

@TeodorPoncu It seems that in a recent commit, you accidentally updated all the expected files for all models. Could you please revert that?

Aug 05 '22 19:08 TeodorPoncu

Related discussion and pointers on generalizing fixed resolution for Swin: https://github.com/pytorch/vision/issues/6227

Also, I wonder if more relative-attention related modules can be reused from Swin

Aug 19 '22 17:08 vadimkantorov

Running the deployed weights with the following command: torchrun --nproc_per_node=1 train.py --model maxvit_t --interpolation bicubic --batch-size 1 --test-only --weights MaxVit_T_Weights.IMAGENET1K_V1

Yields the following results: Test: Acc@1 83.700 Acc@5 96.722

Sep 21 '22 15:09 TeodorPoncu

Two more requests:

Could you please upload the weights on manifold (see internal guide)
Could you update the PR description to show-case the output accuracy of the following command?

torchrun --nproc_per_node=1 train.py --test-only --prototype --weights MaxVit_T_Weights.IMAGENET1K_V1 --model maxvit_t -b 1

Sep 21 '22 16:09 datumbox

Hey @TeodorPoncu!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Sep 23 '22 12:09 github-actions[bot]

vision vision copied to clipboard

MaxVit model

vision
vision copied to clipboard