Swin-Transformer
Swin-Transformer copied to clipboard
swin transformer can not converge with large trainset.
I train the tiny model with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the model can not converge.