pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

[Feature] Make (non `rmlp`) MaxxViT to support variable resolution

Open mywebinfo65536 opened this issue 2 years ago • 1 comments

Dear rwightman, thanks for you job. I was going to input a tensor with size (16,3 112,112) to test the MaxxViT small 224 model, but it failed . do you have any solutions ?

mywebinfo65536 avatar Dec 17 '22 02:12 mywebinfo65536

@mywebinfo65536 Some of the maxvit models are currently resizable, any of the models with rmlp in the name will resize if you pass img_size the model on creation, BUT it must be divisible by 32, so 96 and 128 work, but not 112. This model can do that: maxvit_rmlp_tiny_rw_256

It is possible to add resize support for the tf and other maxvit models, but they require extra code to interpolate the position embeddings which is a bit of work. Any help appreciated here, both the Tensorflow style pos embed and the Swin style would require different interpolation impl.

rwightman avatar Dec 19 '22 18:12 rwightman

Hey @rwightman, do you still want someone to take a look at this?

xvr-hlt avatar Apr 17 '23 04:04 xvr-hlt

@xvr-hlt yeah, if someone wants to tackle the interpolation code (a bit different for both the TF style and swin style) that'd be great.

There's some code to base this on. The TF one can resize on the fly, the swin approach is done on pretrained weight load to a model initialized with different size.

https://github.com/microsoft/Swin-Transformer/blob/f92123a0035930d89cf53fcb8257199481c4428d/utils.py#L61

and

https://github.com/google-research/maxvit/blob/main/maxvit/models/maxvit.py#L198-L237

rwightman avatar Apr 21 '23 04:04 rwightman

@rwightman it might be a good time to move this convo into a WIP PR, but while we're here: did you imagine this was a solution specifically for the maxxvit models defined here, or something more general purpose? If it's the former, is there any way of knowing which config models are SWIN style, and which are tf? I'm assuming the tf models are anything that composes from the _tf_cfg dict, but what about the SWIN style models?

xvr-hlt avatar Apr 24 '23 06:04 xvr-hlt

maxvit, coatnet, and swin models support resize on load now, it would be possible to support dynamic resize as well but would have a small runtime penalty and more complexity so sticking with resize on load

rwightman avatar Aug 21 '23 23:08 rwightman