Sylvain Gugger
Sylvain Gugger
That's because with 4 GPUs you have a batch size 4 times bigger so a number of total training steps 4 times smaller.
If you account for everything yourself, then you don't need to use Accelerate :-)
That error usually comes from a borked install of PyTorch. You should try to re-install it.
cc @muellerzr
cc @younesbelkada
cc @pacman100
It's hard to know without knowing the script you run, but it's very likely that you do not have enough RAM to load the model on the 2 processes: each...
Yes I'm sur many at fair use it since it's a facebookincubator project. It remains that the last commit is 6 months old. I see an issue opened 6 months...
Indeed, the linear layer needs to be created with the same dtype as the original one. Would you like to suggest a PR with a fix?
Yes, you can definitely open a PR with this fix.