Kyle Daruwalla
Kyle Daruwalla
What about replacing `destructure` with this code + the ComponentArrays construction? As opposed to adding this as a separate function. It would move a lot of the tricky stuff to...
This will need a custom `adjust` too
It’s unfortunate that the API here doesn’t use `OptimiserChain` like `AccumGrad`, but I think it is unavoidable. We need to invoke the inner optimizer’s `apply!` with higher precision which we...
Apologies for the train of comments. I guess one distinction here is a gradient transformation vs. an optimizer transformation. Something like `AccumGrad` is a gradient transformation. But something like a...
> That opens the thorny question of what happens to dx when the same x appears twice, are they added? In the case of `total`, I think yes. But in...
Also, I don't think adding `total(f, model, grads)` is worth holding this PR up.
Bump on merging this? We still get regularization questions frequently.
Thanks! Feel free to submit a PR even if it isn't complete, and I'd be happy to help guide it.
Sorry, I had a thought here, but I forgot to write it down. I could see Option 2 being the more attractive one. We could even go one step further...
Re: multiple hyper-parameters I wrote the solution in #34 for a single hyper-parameter, but you could easily make `Scheduler` hold multiple schedules that can be used in the constructor: ```julia...