Kyle Daruwalla comments

Results 404 comments of


                                            Kyle Daruwalla

Add `trainables_nt`

What about replacing `destructure` with this code + the ComponentArrays construction? As opposed to adding this as a separate function. It would move a lot of the tricky stuff to...

Rule for mixed precision training

This will need a custom `adjust` too

Rule for mixed precision training

It’s unfortunate that the API here doesn’t use `OptimiserChain` like `AccumGrad`, but I think it is unavoidable. We need to invoke the inner optimizer’s `apply!` with higher precision which we...

Rule for mixed precision training

Apologies for the train of comments. I guess one distinction here is a gradient transformation vs. an optimizer transformation. Something like `AccumGrad` is a gradient transformation. But something like a...

Add `total(f, model)` to replace implicit `sum(f, Flux.params(model))`

> That opens the thorny question of what happens to dx when the same x appears twice, are they added? In the case of `total`, I think yes. But in...

Add `total(f, model)` to replace implicit `sum(f, Flux.params(model))`

Also, I don't think adding `total(f, model, grads)` is worth holding this PR up.

Add `total(f, model)` to replace implicit `sum(f, Flux.params(model))`

Bump on merging this? We still get regularization questions frequently.

Add more testing for `ComposedSchedule`

Thanks! Feel free to submit a PR even if it isn't complete, and I'd be happy to help guide it.

Using a wrapper + mutation to "implicitly" update scheduled parameters?

Sorry, I had a thought here, but I forgot to write it down. I could see Option 2 being the more attractive one. We could even go one step further...

Using a wrapper + mutation to "implicitly" update scheduled parameters?

Re: multiple hyper-parameters I wrote the solution in #34 for a single hyper-parameter, but you could easily make `Scheduler` hold multiple schedules that can be used in the constructor: ```julia...