pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

[FEATURE] Support FasterViT

Open seefun opened this issue 2 years ago • 3 comments

”FasterViT: Fast Vision Transformers with Hierarchical Attention“

https://github.com/NVlabs/FasterViT

image

The code is written based on timm and provides pretrained weights on ImageNet1k. But there are many layers customized in the code which are different from the implementation of timm. So I'm not sure if we need to make significant adjustments to these code.

It looks interesting, but it doesn't seem like the paper has been released.

seefun avatar Jun 12 '23 06:06 seefun

yeah, noticed this one, it is timm oriented but as always, baked in square image size assumptions and put the downsample at the end of the blocks so needs a decent amount of attention to fix and remap :(

I really truly don't understand the obsession with putting downsample at the end of vit/hybrid blocks :(

Other thing is, I've never found gcvit (same authors) to be particularly easy to train or fine-tune (including reproducing the original results) compared to vit, swin, convnext (which I've successfully managed to reproduce and improve on originals). I wonder how this compares.... given the complexity of the model code, I found the throughput #s surprising as more code usually == more activations and slower speeds.

rwightman avatar Jun 13 '23 17:06 rwightman

Hi, guys, is there any update on this issue? The throughout is really high.

tp-nan avatar Aug 11 '23 09:08 tp-nan

Hi, I can take this one. I'll begin by moving the downsamples as mentioned here

youssefadr avatar Sep 03 '23 14:09 youssefadr