Swin-Transformer
Swin-Transformer copied to clipboard
Maybe there is a mistake in the line 98 of "swin_transformer.py"
In line 98 of https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py
self.scale = qk_scale or head_dim ** -0.5
, if qk_scale is not none, the value of self.scale
will always be qk_scale
, which is inconsistency with the self-attention equation in "Attention is all you need" and Eq.4 in the paper.
I think it should be self.scale = (qk_scale or head_dim) ** -0.5
?
@zeliu98
Hi @DavidZhang88, this is not a bug.
By default, qk_scale
is None, and self.scale
is set to head_dim ** -0.5
, which is consistent with "Attention is all you need".
But we also allow self.scale
to be a manually set constant value qk_scale
(when qk_scale
is not None). Though this is not consistent with "Attention is all you need", it can be helpful in some situations.
Oh, my bad, thank you so much for your instruction. I am sorry for my misunderstanding you code. T_T I hope you will obtain bigger achievements in the field of computer vision! Thank you again!^_^