Optimisers.jl icon indicating copy to clipboard operation
Optimisers.jl copied to clipboard

Optimisers.jl defines many standard optimisers and utilities for learning loops.

Results 56 Optimisers.jl issues
Sort by recently updated
recently updated
newest added

This proposes to add some kind of differentiable `sum(f, trainable(x))` which walks the model. I'm not certain this is the right thing yet. Right now this gets all trainable parameters....

enhancement

This is a proposal for an alternative to `destructure` which doesn't completely flatten the parameters but returns a nested named tuple. The associated reconstructor can be be used on `ComponentArray`s...

In some situations, you have to restructure a lot if you use Flux, for instance if you want to run your batches as seperate solves in DiffEqFlux using an EnsembleProblem....

enhancement

This adds a variant of `destructure` with minimal changes such that it writes back into the original model, instead of creating a copy. This may close #146, cc @glatteis Marked...

This came up here: https://discourse.julialang.org/t/on-the-future-of-flux-destructure-and-sciml-integration/104760/4 Maybe the idea that ComponentArrays.jl treats shared arrays as independent should be mentioned as an important difference, perhaps at the end of this section: https://fluxml.ai/Optimisers.jl/dev/#Obtaining-a-flat-parameter-vector...

documentation

`destructure` uses `map`, I think from before support for Dict was added elsewhere, hence this fails: ```julia julia> d = Dict( :a => Dict( :b => Dict( :c => 1,...

bug

https://github.com/FluxML/Optimisers.jl/pull/136 missed some stray uses of `state` in the readme.

Dear All, I have found a bug in Adam's optimiser when used with Float16. An MWE woud look like this ```julia using Optimisers x = Float16[0.579, -0.729, 0.5493] δx =...

``` using Flux function test_setup(opt, s) state = Flux.setup(opt, s) return state end s = Chain( Dense(2 => 100, softsign), Dense(100 => 2) ) opt = Adam(0.1) @code_warntype test_setup(opt, s)...

This lets you chain optimisation rules like `setup(ClipGrad(1.0) => WeightDecay() => Descent(0.1), model)`. It's just notation & printing, there is no functional change to `OptimiserChain`. Edit: now uses `>>` instead...