Derek-Gong

Results 1 issues of Derek-Gong

I just found that executor.py does not consider world_size and results that different # of GPUs leads to different # of steps in one epoch. So, we get different LR...