Brian Chen
Brian Chen
The optax one sees the entire tree. In their example they map over all leaves in the callback function, but as long as you return something with the same shape...
`setup` is defined in Optimisers.jl, and it's inherently type unstable because it uses a cache to detect + handle shared parameters. Usually I would mark this as a WONTFIX, but...
We could use `_return_type` or friends to do that, yes. One thing I'd like to try to make that easier is to delegate what `Functors.CachedWalk` currently does to the callback...
Looks like the inference path `_return_type` uses might not able to work through the recursion? I wonder if we could use a trick like https://github.com/FluxML/Functors.jl/pull/61 to prevent it from bailing.
What were you thinking for a solution for a quick release? Something like ```julia functor(::Type{
Since chaining rules is so close to `f ∘ g` as mentioned, can we just overload that operator? Unlike `=>` which is just an alias for `Pair` and `|>` which...
I personally don't think asking users to write the order of rules "backwards" is a big deal, but that might just be me. Suggesting `OptimiserChain` be aliased to something short...
Yes, finding a good operator is the higher priority.
> (In fact I remain a little confused why AdamW seems to be backwards, but that's another topic.) That's https://github.com/FluxML/Optimisers.jl/pull/46#discussion_r795262521 and https://github.com/FluxML/Flux.jl/pull/1612. PyTorch inexplicably chooses to do their own thing,...
I would actually be in favour of behaviour 3: `destructure` is fundamentally a function that promises too much, and even after the effort made towards tightening that (+ improving correctness)...