Optimisers.jl
Optimisers.jl copied to clipboard
write `>>` as infix notation for `OptimiserChain`
This lets you chain optimisation rules like setup(ClipGrad(1.0) => WeightDecay() => Descent(0.1), model). It's just notation & printing, there is no functional change to OptimiserChain.
Edit: now uses >> instead of =>.
What about using |> instead?
That's possible. But x |> f normally returns a value... the operation seems more like f ∘ g in that it returns another object like f. In fact Chain is almost exactly ∘, just written left-to-right.
f |> g would be the natural candidate for the reversed composition operator. Also, making r1 => r2 not return a Pair is technically breaking, while defining
(r2::AbstractRule)(r1::AbstractRule) = OptimiserChain(r1, r2)
would only break piratical (and unlikely) similar definitions.
Since chaining rules is so close to f ∘ g as mentioned, can we just overload that operator? Unlike => which is just an alias for Pair and |> which has well-defined (and different) semantics, we are basically composing rules here.
One problem with ∘ is that the order is backwards. OptimiserChain matches Flux.Chain in putting the first operation leftmost.
We could literally use Flux.Chain by moving the struct Chain here, to be shorter. But maybe this is confusing. Rules are a lot like functions which take dx and return modified dx, but not exactly, whereas layers really are.
I personally don't think asking users to write the order of rules "backwards" is a big deal, but that might just be me. Suggesting OptimiserChain be aliased to something short (e.g. Optax has chain) could be a docs-only solution. If we're ok with unicode, then I don't think \rightarrow and co are bound to anything. Ideally we could find an operator like * which parses to op(a, b, c, ...) so that only one call to OptimiserChain is required.
The only other vararg infix operator is ++. Although it isn't a big deal to digest things like :(1 => 2 => 3), indeed :(sin ∘ cos ∘ tan) is binary but prints as if it's not.
Yes, finding a good operator is the higher priority.
I still think |> makes the most sense, it means composition left to right although it is "sourced" by the initial argument in the chain. As an alternative, also \rightarrow (can be typed also \to) is fine.
We discussed this during the call. In general, all the options outlined here don't seem worth it. There is very little value-add, and we are either breaking/bending existing semantics or unnecessarily using up an infix operator (the latter being poor form given that Julia is not permissive about these).
There are already workable solutions for users: use the new import ... as ... syntax or use const ... = OptimiserChain (the latter works for infix operators like \rightarrow mentioned above).
Here's some additional context for the current naming: https://github.com/FluxML/Optimisers.jl/pull/9#discussion_r566559542 (which also goes into stuff discussed here like function composition being a bit confusing).
One option we settled on is to have an un-exported verb: Optimisers.chain. This will just make the OptimiserChain for you, but it has the benefit of not clashing with Flux.Chain if someone wants to explicitly import it. Given the generality of the name, we will not export it by default.
Building in two distinct names to allow someone to write using Optimisers: chain and afterwards save a few chars doesn't seem great.
The nice thing about infix syntax like WeightDecay() => Descent(0.1) is that it also saves one level of annoying nested brackets. In both typing and printing.
breaking
I am very dubious that one case of Pair(AbstractRule, AbstractRule) exists in the wild. Nevertheless we just missed a golden opportunity to roll this into the 0.3 change.
don't think asking users to write the order of rules "backwards" is a big deal
I do think this is quite confusing. (In fact I remain a little confused why AdamW seems to be backwards, but that's another topic.)
unnecessarily using up an infix operator
This objection is to defining a currently undefined operator, like ++ or \rightarrow. It's awkward when two packages use the same one.
It does not apply to abusing other built-in infix operators. We could use say >>? This has the advantage that, unlike |> and ∘, it does not have a confusing nearby pre-existing meaning.
(In fact I remain a little confused why AdamW seems to be backwards, but that's another topic.)
That's https://github.com/FluxML/Optimisers.jl/pull/46#discussion_r795262521 and https://github.com/FluxML/Flux.jl/pull/1612. PyTorch inexplicably chooses to do their own thing, and despite reading https://github.com/pytorch/pytorch/pull/21250#issuecomment-520559993 multiple times I still don't understand why.