pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

[FEATURE] Feature extraction for SWIN Transformer

Open Animatory opened this issue 4 years ago • 4 comments

There are several applications from creators of SWIN Transformer in Object detection and Semantic segmentation. But Implementation a bit different from the original SWIN for image classification (BasicLayer has additional operations before the main part). Are you planning to add this feature extraction part to your version?

Animatory avatar May 05 '21 16:05 Animatory

Mentioned in #607, yes, plan is to add feature extraction but in a way that's generic for all non-CNN archs (so the various vision transformers and the new MLP-Mixer nets). Have other things to do and I haven't quite figured out the interface wrt to my existing feature helpers for CNNs

rwightman avatar May 05 '21 18:05 rwightman

Hey @rwightman – once you have a good idea of the interface I'm happy to help with this – I'd like to use it for my experimentation.

One approach for e.g. VIT/DEIT/SWIN would be to change the way the blocks work so that they take and return non-flattened input (e.g. shape B, C, H, W), and flatten/unflatten internally (to B, C, H*W). This would allow the forward method for features_only models to work without change and in particular mean that they could be used as is for unets/segmentation stuff.

xvr-hlt avatar May 07 '21 04:05 xvr-hlt

is there any update?

https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/backbones/swin.py#L746

realeve avatar Sep 23 '21 13:09 realeve

@rwightman It seems that it is difficult to implement FPN for ViT using the same criteria as CNN, but it is achievable such as Swin (4 stage) and CSWin (4 stage). Can we implement these models that with "Stage" first, because we hope to test the performance of some ViT models in downstream tasks through in only one frameworks (eg. timm).

realeve avatar Sep 23 '21 13:09 realeve

supported on main branch now w/ NHWC output (see #1438 for more)

rwightman avatar Mar 20 '23 04:03 rwightman