Some questions for "mmaction/models/common/transformer.py" file.
Hello, I'm confused when I debug the TimeSformer. In the file :"mmaction/models/common/transformer.py", line about 77, it say "res_temporal = self.attn(query_t, query_t, query_t)[0].permute(1, 0, 2)". I don't understand why use three same 'query_t' as 'Q, K, V' to calculate attention. However, in origin paper, it use a linear to get 'Q, K, V'. Looking forward to your answer, thank you very much.
@WP-CV I think you are right. @Dai-Wenxun could you have a look.
@congee524 Could please help check this issue?
@WP-CV hi, our implement use nn.MultiheadAttention. when q, k, v are same tensor(ie. query_t), it will perform a self-attention projection first, and get the projected tensor Q, K, V. then calculate multihead attention. so our implement is same as paper. for refer: https://github.com/pytorch/pytorch/blob/a4dca9822dfabcdbd1b36a12c013764f2af87613/torch/nn/functional.py#L4749-L4753
@WP-CV hi, our implement use nn.MultiheadAttention. when q, k, v are same tensor(ie. query_t), it will perform a self-attention projection first, and get the projected tensor Q, K, V. then calculate multihead attention. so our implement is same as paper. for refer: https://github.com/pytorch/pytorch/blob/a4dca9822dfabcdbd1b36a12c013764f2af87613/torch/nn/functional.py#L4749-L4753
Ok, I get it, thank you, I negligenced nn.MultiheadAttention's process.
@WP-CV If you have any further questions, feel free to re-open the issue. Thanks!