Swin-Transformer Why stoping grad for the bias of projection on k?

Why stoping grad for the bias of projection on k?

Open bnu-wangxun opened this issue 1 year ago • 1 comments

https://github.com/microsoft/Swin-Transformer/blob/afeb877fba1139dfbc186276983af2abb02c2196/models/swin_transformer_v2.py#L149

Aug 31 '22 08:08 bnu-wangxun

h.cat((self.q_bias, torch.zeros_like(self.v_bias, requires_grad=False), self.v_bias))

It is equivalent to the algorithm with k bias but simpler. You can derive it yourself, very simple.

Sep 29 '22 15:09 ancientmooner