optax
optax copied to clipboard
Add a mathematical description of the algorithms
Like adam or nadam, it would be nice to have mathematical descriptions of as many algorithms as possible. To start with, having a clear description of what sgd with momentum and nesterov is would be very good. Algorithms to do if possible below (sometimes the description may be too long). Refer to the reference each time (on arxiv you can even extract the source to potentially simply copy-paste the algorithm but make sure it matches the implementation).
- [x] SGD: https://github.com/google-deepmind/optax/pull/830
- [x] AdaBelief: https://github.com/google-deepmind/optax/pull/869
- [ ] Adagrad
- [ ] Adafactor
- [x] Adamax, Adamaxw: https://github.com/google-deepmind/optax/pull/918
- [x] AdamW: https://github.com/google-deepmind/optax/pull/894
- [ ] AMSGrad
- [ ] Fromage
- [ ] Lamb
- [ ] Lars
- [ ] Lion
- [x] Noisy SGD: https://github.com/google-deepmind/optax/pull/857
- [ ] Novograd
- [ ] OptimisticGD
- [ ] DifferentiallyPrivateSGD
- [ ] Radam
- [ ] RMSProp
- [ ] SM3
- [ ] Yogi