video-swin-transformer-pytorch
video-swin-transformer-pytorch copied to clipboard
The shape of the logits
the output 'logits' are of shape (1,768,8,7,7), but it should be (batch, num_class). How to adapt the code to classify videos?
The fc layer is defined in (https://github.com/SwinTransformer/Video-Swin-Transformer/tree/master/mmaction/models/recognizers)/base.py from the official implementation.