Optimisers.jl icon indicating copy to clipboard operation
Optimisers.jl copied to clipboard

Controlling which optimizer rule you're adjusting.

Open murrellb opened this issue 11 months ago • 5 comments

This PR tries to address the issue in https://github.com/FluxML/Optimisers.jl/issues/201 where, if two Rules have the same parameter name, you can't use the adjust interface to control them independently.

This is intended to be non-breaking, so the previous behavior is undisturbed: image but you can now additionally specify the type you wish to change: image

Implementation-wise, this added quite a few adjust and adjust! methods. I'm not 100% sure all of them are needed, so maybe we can prune a few back but I wasn't quite sure of the edge cases I'm not aware of, so I tried to cover everything.

I this PR's version, analogous to above, if you just pass in a float you'll change eta, but only in the optimizer you intend: image This is maybe less critical to have, so we could cut back all of the non-keyword methods that specify a type.

Doing all this introduces a bit of core package developer complexity, but it is all generic so you shouldn't need to touch it when adding a new optimizer, and the usage seems pretty intuitive from a user perspective, so I don't think it adds interface complexity. To be honest, the current behavior seems like a footgun, and as a user with an OptimizerChain, I'd only ever use adjust/adjust! while specifying the type, because otherwise I'd have to check the parameter list of all the OptimizerChain components just to be sure I'm not doing something I don't intend.

PR Checklist

  • [ ] Tests are added
  • [ ] Documentation, if applicable

murrellb avatar Dec 19 '24 19:12 murrellb

I can't easily copy-paste from screenshots...

  1. The spelling proposed here is Optimisers.adjust!(os, SignDecay, lambda=0.3), passing just the type.

Alternatives are:

  1. Optimisers.adjust!(os, SignDecay(0.3)), passing a whole new rule.

  2. Changing e.g. SignDecay to use lambda1 and WeightDecay to use lambda2 to avoid the clash.

  3. Something more exotic, like adjust!(fun::Function, os, lambda=0.3) where this fun must return a Bool and gets... the rule?

  4. Leave it to you:

julia> using Optimisers, Functors

julia> os = Optimisers.setup(OptimiserChain(SignDecay(0.1), AdamW(lambda=0.1)), (; x=[12 34.]))
(x = Leaf(OptimiserChain(SignDecay(0.1), AdamW(0.001, (0.9, 0.999), 0.1, 1.0e-8, true)), (nothing, ([0.0 0.0], [0.0 0.0], (0.9, 0.999)))),)

julia> fmap(os; exclude = r -> r isa SignDecay) do r
         Optimisers.adjust(r, lambda=0.333)
       end
(x = Leaf(OptimiserChain(SignDecay(0.333), AdamW(0.001, (0.9, 0.999), 0.1, 1.0e-8, true)), (nothing, ([0.0 0.0], [0.0 0.0], (0.9, 0.999)))),)

Are there other options?

I suppose 3 doesn't help if you wish to adjust some learning rates and not others, we aren't going to rename them like eta37.

The advantage of 2 is you need never know the field names. The disadvantage is that adjust!(os, AdamW(lambda=0.3)) will reset eta too, you have to still have your original choices around.

mcabbott avatar Dec 21 '24 18:12 mcabbott

This should be easy, convenient, and visible (so that the user is aware of the possible footgun). That rules out 5.

Optimisers.adjust!(os, SignDecay(0.3)), passing a whole new rule.

Will this replace a whole OptimiserChain, or just the rule that matches the type? The fact that I'm asking means that the user has to memorize/understand an extra convention to use this.

Something more exotic, like adjust!(fun::Function, os, lambda=0.3) where this fun must return a Bool and gets... the rule?

I'm not sure what I'd do with this that you can't do with the current PR, but with (slightly) more difficulty.

Another option is allowing you to label an optimizer component, with a default that matches its type. So you can go:

os = Optimisers.setup(OptimiserChain(SignDecay(0.1, label = "a"), AdamW(lambda=0.1)), (; x=[12 34.]))
Optimisers.adjust!(os, "a", lambda = 0.3)
Optimisers.adjust!(os, "AdamW", lambda = 0.01) #"AdamW" is the default label for AdamW.

This would let you have two optimisers of the same type in a chain, but label them differently so you can adjust them independently (which my current PR doesn't allow). However I can't think of motivation for needing this, and this would mean we have to introduce changes to all the current optimisers (and might invalidate checkpointed opt states, depending on how it is done?).

Still prefer the current PR's solution, but I'm not picky as long as it is easy to use and visible.

murrellb avatar Dec 21 '24 19:12 murrellb

I agree 5 is messy.

But the point is that the state tree is some simple nested Julia object, which you are free to manipulate however you like -- you do not have to work through any explicit API of this package, and hence the API does not have to expand to allow arbitrarily complex manipulations.

Maybe I'm a little hesitant to add API too quickly, which is why I think we should think through alternatives carefully.

Adding labels seems way over the top to me.

or just the rule that matches the type?

Obviously just the rule, surely? If you have setup Momentum, and do adjust!(... AdamW...) in any form, it can't replace the whole rule because it can't change the saved momenta -- that's the whole point of adjust! vs. a fresh setup.

mcabbott avatar Dec 21 '24 19:12 mcabbott

Obviously just the rule, surely? If you have setup Momentum, and do adjust!(... AdamW...) in any form, it can't replace the whole rule because it can't change the saved momenta -- that's the whole point of adjust! vs. a fresh setup.

This will only be "obvious" to a user who has both i) a very clear understanding of how this package does what it does, and ii) a mental model of the package devs. They also have to be very confident that there isn't some other reason (that they're currently unaware of) that doesn't eg. force them to have to re-specify the entire rule (eg. the package internals for some reason not handling automatic construction of the rest of the optimizer chain). Not obvious.

murrellb avatar Dec 21 '24 19:12 murrellb

I think the proposed solution is a light addition and intuitive addition to the interface and solves a concrete problem with no simple workaround, so we should go with it.

CarloLucibello avatar Dec 30 '24 19:12 CarloLucibello