Swin-Transformer Maybe there is a mistake in the line 98 of "swin

Maybe there is a mistake in the line 98 of "swin_transformer.py"

Open DavidZhang88 opened this issue 2 years ago • 2 comments

In line 98 of https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py
self.scale = qk_scale or head_dim ** -0.5 , if qk_scale is not none, the value of self.scale will always be qk_scale, which is inconsistency with the self-attention equation in "Attention is all you need" and Eq.4 in the paper.

I think it should be self.scale = (qk_scale or head_dim) ** -0.5 ? @zeliu98

Sep 06 '22 08:09 DavidZhang88

Hi @DavidZhang88, this is not a bug.

By default, qk_scale is None, and self.scale is set to head_dim ** -0.5, which is consistent with "Attention is all you need".

But we also allow self.scale to be a manually set constant value qk_scale (when qk_scale is not None). Though this is not consistent with "Attention is all you need", it can be helpful in some situations.

Sep 29 '22 15:09 zeliu98

Oh, my bad, thank you so much for your instruction. I am sorry for my misunderstanding you code. T_T I hope you will obtain bigger achievements in the field of computer vision! Thank you again！^_^

Oct 27 '22 16:10 DavidZhang88

Swin-Transformer Swin-Transformer copied to clipboard

Maybe there is a mistake in the line 98 of "swin_transformer.py"

Swin-Transformer
Swin-Transformer copied to clipboard