Swarnim Jain

Results 2 comments of Swarnim Jain

Thanks. I've added gradient accumulation (4 GPU x 2 BS x 32 grad acc = 256) and losses do seem to go down. Do you know why the loss scales...

Hi, just wanted to follow up on this