Derek-Gong
Results
1
issues of
Derek-Gong
I just found that executor.py does not consider world_size and results that different # of GPUs leads to different # of steps in one epoch. So, we get different LR...