Aleksa Gordić

Results 33 comments of Aleksa Gordić

Adding to what @chinthysl has said we now also support ZeRO stage 1, where we shard the optimizer states, so only a shard of gradients is updated on each device...

Hey @akulchik are you still having problems with this?

+1 edit: I solved this by using python 3.9, 3.10 was causing issues. tmp workaround for me