Rulin Shao

Results 2 comments of Rulin Shao

I could load the saved checkpoint and resume training, the NaN doesn't seem to appear in the same iteration, instead, it appears every 16900 iterations. I.e., I resumed the training...