pytorch-image-models
pytorch-image-models copied to clipboard
What batch size number other than 1024 have you tried when training a DeiT model?
What batch size number other than batch size of 1024 have you tried when training a DeiT or ViT model? In the paper, DeiT (https://arxiv.org/abs/2012.12877), they used a batch size of 1024 and they mentioned that the learning rate should be scaled according to the batch size.
However, I was wondering if you guys have any experience or successfully train a DeiT model with a batch size that is even less than 512? If yes, what accuracy did you achieve?
This would be helpful for someone training on constrained resources that cannot train on a batch size of 1024.