Optimisers.jl icon indicating copy to clipboard operation
Optimisers.jl copied to clipboard

write `>>` as infix notation for `OptimiserChain`

Open mcabbott opened this issue 2 years ago • 12 comments

This lets you chain optimisation rules like setup(ClipGrad(1.0) => WeightDecay() => Descent(0.1), model). It's just notation & printing, there is no functional change to OptimiserChain.

Edit: now uses >> instead of =>.

mcabbott avatar Apr 06 '23 13:04 mcabbott

What about using |> instead?

CarloLucibello avatar Apr 06 '23 15:04 CarloLucibello

That's possible. But x |> f normally returns a value... the operation seems more like f ∘ g in that it returns another object like f. In fact Chain is almost exactly , just written left-to-right.

mcabbott avatar Apr 06 '23 16:04 mcabbott

f |> g would be the natural candidate for the reversed composition operator. Also, making r1 => r2 not return a Pair is technically breaking, while defining

(r2::AbstractRule)(r1::AbstractRule) = OptimiserChain(r1, r2)

would only break piratical (and unlikely) similar definitions.

CarloLucibello avatar Apr 07 '23 11:04 CarloLucibello

Since chaining rules is so close to f ∘ g as mentioned, can we just overload that operator? Unlike => which is just an alias for Pair and |> which has well-defined (and different) semantics, we are basically composing rules here.

ToucheSir avatar Apr 10 '23 18:04 ToucheSir

One problem with is that the order is backwards. OptimiserChain matches Flux.Chain in putting the first operation leftmost.

We could literally use Flux.Chain by moving the struct Chain here, to be shorter. But maybe this is confusing. Rules are a lot like functions which take dx and return modified dx, but not exactly, whereas layers really are.

mcabbott avatar Apr 10 '23 19:04 mcabbott

I personally don't think asking users to write the order of rules "backwards" is a big deal, but that might just be me. Suggesting OptimiserChain be aliased to something short (e.g. Optax has chain) could be a docs-only solution. If we're ok with unicode, then I don't think \rightarrow and co are bound to anything. Ideally we could find an operator like * which parses to op(a, b, c, ...) so that only one call to OptimiserChain is required.

ToucheSir avatar Apr 10 '23 19:04 ToucheSir

The only other vararg infix operator is ++. Although it isn't a big deal to digest things like :(1 => 2 => 3), indeed :(sin ∘ cos ∘ tan) is binary but prints as if it's not.

mcabbott avatar Apr 10 '23 19:04 mcabbott

Yes, finding a good operator is the higher priority.

ToucheSir avatar Apr 10 '23 20:04 ToucheSir

I still think |> makes the most sense, it means composition left to right although it is "sourced" by the initial argument in the chain. As an alternative, also \rightarrow (can be typed also \to) is fine.

CarloLucibello avatar Apr 12 '23 04:04 CarloLucibello

We discussed this during the call. In general, all the options outlined here don't seem worth it. There is very little value-add, and we are either breaking/bending existing semantics or unnecessarily using up an infix operator (the latter being poor form given that Julia is not permissive about these).

There are already workable solutions for users: use the new import ... as ... syntax or use const ... = OptimiserChain (the latter works for infix operators like \rightarrow mentioned above).

Here's some additional context for the current naming: https://github.com/FluxML/Optimisers.jl/pull/9#discussion_r566559542 (which also goes into stuff discussed here like function composition being a bit confusing).

One option we settled on is to have an un-exported verb: Optimisers.chain. This will just make the OptimiserChain for you, but it has the benefit of not clashing with Flux.Chain if someone wants to explicitly import it. Given the generality of the name, we will not export it by default.

darsnack avatar Apr 21 '23 17:04 darsnack

Building in two distinct names to allow someone to write using Optimisers: chain and afterwards save a few chars doesn't seem great.

The nice thing about infix syntax like WeightDecay() => Descent(0.1) is that it also saves one level of annoying nested brackets. In both typing and printing.

breaking

I am very dubious that one case of Pair(AbstractRule, AbstractRule) exists in the wild. Nevertheless we just missed a golden opportunity to roll this into the 0.3 change.

don't think asking users to write the order of rules "backwards" is a big deal

I do think this is quite confusing. (In fact I remain a little confused why AdamW seems to be backwards, but that's another topic.)

unnecessarily using up an infix operator

This objection is to defining a currently undefined operator, like ++ or \rightarrow. It's awkward when two packages use the same one.

It does not apply to abusing other built-in infix operators. We could use say >>? This has the advantage that, unlike |> and , it does not have a confusing nearby pre-existing meaning.

mcabbott avatar Sep 12 '23 04:09 mcabbott

(In fact I remain a little confused why AdamW seems to be backwards, but that's another topic.)

That's https://github.com/FluxML/Optimisers.jl/pull/46#discussion_r795262521 and https://github.com/FluxML/Flux.jl/pull/1612. PyTorch inexplicably chooses to do their own thing, and despite reading https://github.com/pytorch/pytorch/pull/21250#issuecomment-520559993 multiple times I still don't understand why.

ToucheSir avatar Sep 12 '23 14:09 ToucheSir