Joonsun Auh
Results
3
comments of
Joonsun Auh
I used dp, because ddp is not implemented in linear evaluation XD. So when I tried to use 8 GPUs than error has occurred.
Yes, I did not try to use more than 8 gpus, but 4 gpus are ok. And my batch size is 1024. I tried to use 2048 batch size but...
Then how many times last layer has been trained in step linear evaluation? When I operate the linear evaluation, last layer always has a training step. I can't find the...