Any plan to add `Swin` transformer?

Open klae01 opened this issue 3 years ago • 2 comments

Swin transformer achieves higher accuracy in model size and computational amount similar to ViT. I think that using clip's method and dataset will show higher performance.

ViT-B/16, 384x384, 86M, 55.4Gflops, 77.9 (imagenet 1k acc)
Swin-B, 384x384, 88M, 47.0Gflops, 84.5 (imagenet 1k acc)

reference : Swin Transformer table 1 (a)

It would be great if a Swin transformer could be added to compare performance.

Although Resnet is still powerful, I would like to compare whether the performance that was too poor compared to ViT showed that transformer showed overwhelming performance compared to convolution in vision field. Therefore, it would be nice to add famous convolution-based networks such as Efficientnet and Convnext.

Aug 14 '22 09:08 klae01

It is excactly what I've expected too

Sep 28 '22 07:09 celestialxevermore

I've expected , too

Jan 13 '23 16:01 MaAo