Open-Sora-Plan GaLore optimizer

GaLore optimizer

Open ostix360 opened this issue 11 months ago • 1 comments

Hi!

Galore optimiser is an optimiser based on Adam that projects the gradient, so the optimiser memory is reduced and the gradient memory is null (or near 0).

See the paper for more information

This optimiser seems promising and can be usefull to train this kind of big model

Mar 26 '24 08:03 ostix360

Thank you for your advice! We've included it as a future plan.

Mar 28 '24 10:03 LinB203

Open-Sora-Plan Open-Sora-Plan copied to clipboard

GaLore optimizer

Open-Sora-Plan
Open-Sora-Plan copied to clipboard