pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

Properly get feature maps from Swin-V2 instead of norm

Open sarmientoj24 opened this issue 3 years ago • 10 comments

Swin-V2 as a backbone gives the feature maps without norm. How to get it?

sarmientoj24 avatar Sep 04 '22 09:09 sarmientoj24

I tried using a different repo of Swin-V2 and I am getting these four feature maps

(torch.Size([1, 96, 64, 64]),
 torch.Size([1, 192, 32, 32]),
 torch.Size([1, 384, 16, 16]),
 torch.Size([1, 768, 8, 8]))

I tried editing yours to be like this

    def forward_features(self, x):
        x = self.patch_embed(x)
        if self.absolute_pos_embed is not None:
            x = x + self.absolute_pos_embed
        x = self.pos_drop(x)
        
        outputs = []
        print(len(self.layers))
        for layer in self.layers:
            x = layer(x)
            print(x.shape)
            outputs.append(x)

        # x = self.norm(x)  # B L C
        return outputs

model = timm.create_model('swinv2_tiny_window8_256', img_size=(256, 256), num_classes=2, pretrained=True)
sample = model.forward_features(torch.randn(1, 3, 256, 256))

but i get these

torch.Size([1, 1024, 192])
torch.Size([1, 256, 384])
torch.Size([1, 64, 768])
torch.Size([1, 64, 768])

sarmientoj24 avatar Sep 04 '22 09:09 sarmientoj24

@sarmientoj24 unfortunately, swin v1 and v2 (adapted from the official microsoft modelling code) put the downsample at the end of blocks, so you can't simply take the output of each block ... in their code for obj det//segmentation, they actually have to modify the block to output both feat maps, which is silly, you can just reorganize the stages to have downsample first... https://github.com/SwinTransformer/Swin-Transformer-Object-Detection/blob/master/mmdet/models/backbones/swin_transformer.py#L362-L402

I actually fixed this in my swin v2 implementation (before the original came out) https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/swin_transformer_v2_cr.py#L597-L616, in theory I could make that implementation work with either v1 or offficial v2 checkpoints (remap them on load)...

My swinv2_cr impl, and now recent maxvit, coatnet, gcvit models would be possible to enable features_only support pretty easily as I organized the models appropriately.

rwightman avatar Sep 06 '22 23:09 rwightman

Does that mean I should just use swinv2_cr instead?

sarmientoj24 avatar Sep 07 '22 04:09 sarmientoj24

@sarmientoj24 it'd be worth testing your use case with swinv2_cr_small_ns_224 to see if it works better, if that's the case I could prioritize making the other v1/v2 models available through that impl...

rwightman avatar Sep 07 '22 15:09 rwightman

@rwightman

model = timm.create_model('swinv2_cr_small_ns_224', img_size=(224, 224), num_classes=2, features_only=True, pretrained=True)


RuntimeError: features_only not implemented for Vision Transformer models.

when i remove features only...

sample = model.forward_features(torch.randn(2, 3, 224, 224))
sample.shape

torch.Size([2, 768, 7, 7])

using forward

sample = model.forward(torch.randn(2, 3, 224, 224))
sample.shape

torch.Size([2, 2])

sarmientoj24 avatar Sep 08 '22 06:09 sarmientoj24

seems like the cr version is from Christopher's which I have already used before. That one has no sequential self-attention, right? and this too?

sarmientoj24 avatar Sep 08 '22 06:09 sarmientoj24

@sarmientoj24 it was based on his impl but ended up with quite a few changes, I didn't incl the sequential attn as I wasn't convinced it was working properly...

The forward_features isn't added yet, what I meant was you should be able to use your modifications made for the other swin and you'd get the shapes you expect from the blocks, if that worked as you expect, would be a good signal for me to add full support...

rwightman avatar Sep 09 '22 05:09 rwightman

@rwightman is the sequential attention a default on microsoft's original Swin-V2?

sarmientoj24 avatar Sep 09 '22 07:09 sarmientoj24

@sarmientoj24 no, they did not release it or any of the 'really' big models that use it like giant...

rwightman avatar Sep 09 '22 17:09 rwightman

RuntimeError: Unknown model (swinv2_tiny_window8_256)

why and how to fix it?

Bailey-24 avatar Dec 09 '22 10:12 Bailey-24

as per #1438 ... feat extraction for swin v1 & v2 is supported now, in NHWC format (v2_cr models support NCHW like other convnets but slight performance penalty to do that for the other v1/v2 setup so didn't permute by default.

rwightman avatar Mar 20 '23 04:03 rwightman