Open-Sora-Plan
Open-Sora-Plan copied to clipboard
GaLore optimizer
Hi!
Galore optimiser is an optimiser based on Adam that projects the gradient, so the optimiser memory is reduced and the gradient memory is null (or near 0).
See the paper for more information
This optimiser seems promising and can be usefull to train this kind of big model
Thank you for your advice! We've included it as a future plan.