Han Hu
Han Hu
The operators are tested only under branch of 1.1.0 and 1.3.0. You may try them under these branches.
If you encounter NaN, please try more times until there is no NaN. Some random initialization might cause divergence problem. If problem still exists, it might because the base lr...
> Hi! > > To combine Swin transformer backbone with Deformable DETR detector, [SOLQ](https://github.com/megvii-research/SOLQ/blob/main/models/swin_transformer.py) did some changes to `swin_transformer.py` that allow to compute the padding mask dynamically and allow for...
> Thanks for your sharing! I suggest that you may also try the Swin V2 models with SimMIM pretraining. In our experience, SimMIM pre-training should be more friendly for low-level...
> The pre-norm only bounds the activations of input but not output. The output could accumulate to be larger and larger in deeper layers.
> h.cat((self.q_bias, torch.zeros_like(self.v_bias, requires_grad=False), self.v_bias)) It is equivalent to the algorithm with k bias but simpler. You can derive it yourself, very simple.
> No, it will not get better accuracy. But if you use SimMIM pre-training, Swin V2-L will perform better than Swin V2-B. Please try https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md#simmim-support
> Swin V1 uses pre-norm layers. Swin V2 uses a new normalization configuration named res-post-norm. Please look into https://arxiv.org/pdf/2111.09883.pdf for details.
> I am trying to train an image classifier where image ground truth contains multiple classes. Is it possible to train a model that outputs multiple classes? Yes, it should...