Optimisers.jl issues

add ADAHESSIAN

https://arxiv.org/abs/2006.00719

enhancement

### Motivation and description In https://arxiv.org/abs/2411.16085 the authors proposed "Cautious Optimizers: Improving Training with One Line of Code". Would you be interested in having this in Optimisers.jl? If so, since...

fps

Implement LiMuon Optimizer

### Motivation and description The LiMuon: Light and Fast Muon Optimizer for Large Models is a resource effective variant of the Muon optimizer. It's getting a lot of attention today...

DoktorMike

Adamw with lambda = 0.01 as a default

Hello, Wouldn't it make more sense if by default the Adamw rule was different than Adam by making lambda != 0.0 (by default). The default value in [pytorch](https://docs.pytorch.org/docs/2.7/generated/torch.optim.AdamW.html) is 0.01....

camilodlt

Expose an API for the accumulated/total gradients for shared parameters

### Motivation and description It would be useful to have a public API for the two steps within `update!` separately: https://github.com/FluxML/Optimisers.jl/blob/4ff61fca27c31f6d6bbc6bac19019b1de3634fd7/src/interface.jl#L70-L79 In the total gradient is typically more useful than...

jondeuce

Optimisers.jl
Optimisers.jl copied to clipboard

Metadata

add ADAHESSIAN

Cautious optimizers

Implement LiMuon Optimizer

Adamw with lambda = 0.01 as a default

Expose an API for the accumulated/total gradients for shared parameters

← Metadata

Owner

Metadata

Optimisers.jl Optimisers.jl copied to clipboard

Metadata

add ADAHESSIAN

Cautious optimizers

Implement LiMuon Optimizer

Adamw with lambda = 0.01 as a default

Expose an API for the accumulated/total gradients for shared parameters

← Metadata

Owner

Metadata

Optimisers.jl
Optimisers.jl copied to clipboard