Optimisers.jl
Optimisers.jl copied to clipboard
Optimisers.jl defines many standard optimisers and utilities for learning loops.
### Motivation and description In https://arxiv.org/abs/2411.16085 the authors proposed "Cautious Optimizers: Improving Training with One Line of Code". Would you be interested in having this in Optimisers.jl? If so, since...
### Motivation and description The LiMuon: Light and Fast Muon Optimizer for Large Models is a resource effective variant of the Muon optimizer. It's getting a lot of attention today...
Hello, Wouldn't it make more sense if by default the Adamw rule was different than Adam by making lambda != 0.0 (by default). The default value in [pytorch](https://docs.pytorch.org/docs/2.7/generated/torch.optim.AdamW.html) is 0.01....
### Motivation and description It would be useful to have a public API for the two steps within `update!` separately: https://github.com/FluxML/Optimisers.jl/blob/4ff61fca27c31f6d6bbc6bac19019b1de3634fd7/src/interface.jl#L70-L79 In the total gradient is typically more useful than...