Awni Hannun
Awni Hannun
That reminds me of the way [optax does schedules](https://optax.readthedocs.io/en/latest/optax-101.html#weight-decay-schedules-and-clipping) which I actually find pretty nice (and am hoping we will follow something similar) . Basically the `learning_rate` (and some other...
> Also the main reason is not only to avoid calling the callable but also to move the responsibility of state keeping one layer above, to the trainer. That makes...
> It only works for whatever the author of the optimizer (and only the optimizer) had in mind to make dynamic. For instance, does it work for the betas values?...
> For instance how would we even log the learning rate. Calling `optimizer.learning_rate` should give the right learning rate for logging? > Would we save the step as an array...
Thanks for the revision. This is a bit inbetween what I was suggesting and what @angeloskath was suggesting. I think if we go with the route of explicit schedule classes...
@postmalloc are you still working on this? Just curious what the plan is as there have been some requests for schedulers :)
> I wasn't actually sure if a consensus had been reached between the approach you suggested and what Makes sense I think it will be easier to criticize the pros/cons...
Hi @postmalloc checking in on this. Are you still working on the PR?
Hi @postmalloc are you planning to work on this PR at all? If not let's close it so we can let someone else tackle / work on schedulers.
The part you shared looks nice, it's pretty simple. One of our optimizers (AdaFactor) already has the step as state, so we'd need to refactor that into the base class....