Swin-Transformer
Swin-Transformer copied to clipboard
Why stoping grad for the bias of projection on k?
https://github.com/microsoft/Swin-Transformer/blob/afeb877fba1139dfbc186276983af2abb02c2196/models/swin_transformer_v2.py#L149
h.cat((self.q_bias, torch.zeros_like(self.v_bias, requires_grad=False), self.v_bias))
It is equivalent to the algorithm with k bias but simpler. You can derive it yourself, very simple.