Aleksa Gordić
Results
33
comments of
Aleksa Gordić
Adding to what @chinthysl has said we now also support ZeRO stage 1, where we shard the optimizer states, so only a shard of gradients is updated on each device...
Hey @akulchik are you still having problems with this?
+1 edit: I solved this by using python 3.9, 3.10 was causing issues. tmp workaround for me