Compact-Transformers
Compact-Transformers copied to clipboard
Question about the batch size
Hi, this work is awesome. I just have one little question. The paper says the total batch size is 128 for CIFAR's and 4 GPU's were used in parallel. That doesn't mean the total batch size is 128 * 4 = 512, does it? DDP is for Imagenet, and non-distributed is for CIFAR, am I correct?
Thanks a ton :)