Swin-Transformer
Swin-Transformer copied to clipboard
Why to use 3 slices to generate mask?
Here you use 3 slices to generate mask, which generates 9 different kinds of masks. But if I understand it correctly, only 4 different kinds of masks (as you shown in Figure 4 of your paper, different windows can share the same mask) are needed, thus 2 slices (i.e., slice(0, -self.shift_size)
and slice(-self.shift_size, None)
) is sufficient, isn't it?