mlx icon indicating copy to clipboard operation
mlx copied to clipboard

Adding More Optimizers

Open AndreSlavescu opened this issue 1 year ago • 2 comments

Goal:

  • decide which optimizers are worth implementing in a priority order

Remaining Optimizers + General Usage

from my understanding there are really only 2 from the ones listed in https://github.com/ml-explore/mlx/pull/718 that are used either as alternatives or in particular classes of problems that users may want (?):

LBFGS and Averaged SGD

I believe the remaining ones that are listed are not as cited, so they could be ignored until further adoption.

re: https://github.com/ml-explore/mlx/pull/718#issuecomment-1957909442

AndreSlavescu avatar Feb 21 '24 22:02 AndreSlavescu

From what I gathered, at least for language models, linear warmup (see mlx #721) and SGDR are the only other 2 not implemented that are commonly used . The latter can be generalized - in a similar way as the former - as a cyclic framework for other schedules (see CyclicalSchedule from https://mxnet.apache.org/versions/1.7/api/python/docs/tutorials/packages/gluon/training/learning_rates/learning_rate_schedules_advanced.html)

chimezie avatar Feb 22 '24 10:02 chimezie

Do other frameworks implement SGDR? Wasn't able to find it when referencing the popular frameworks.

AndreSlavescu avatar Feb 22 '24 17:02 AndreSlavescu