mlx
mlx copied to clipboard
Adding More Optimizers
Goal:
- decide which optimizers are worth implementing in a priority order
Remaining Optimizers + General Usage
from my understanding there are really only 2 from the ones listed in https://github.com/ml-explore/mlx/pull/718 that are used either as alternatives or in particular classes of problems that users may want (?):
LBFGS and Averaged SGD
I believe the remaining ones that are listed are not as cited, so they could be ignored until further adoption.
re: https://github.com/ml-explore/mlx/pull/718#issuecomment-1957909442
From what I gathered, at least for language models, linear warmup (see mlx #721) and SGDR are the only other 2 not implemented that are commonly used . The latter can be generalized - in a similar way as the former - as a cyclic framework for other schedules (see CyclicalSchedule from https://mxnet.apache.org/versions/1.7/api/python/docs/tutorials/packages/gluon/training/learning_rates/learning_rate_schedules_advanced.html)
Do other frameworks implement SGDR? Wasn't able to find it when referencing the popular frameworks.