Swarnim Jain
Results
2
comments of
Swarnim Jain
Thanks. I've added gradient accumulation (4 GPU x 2 BS x 32 grad acc = 256) and losses do seem to go down. Do you know why the loss scales...
Hi, just wanted to follow up on this