Michael Abbott comments

Results 1143 comments of


                                            Michael Abbott

Add in-place `destructure!`

I commented the method out just to focus on getting one thing working first. I believe it still needs tests, but otherwise this is nearly done. Maybe I should check...

Rule for mixed precision training

Would like to ask for some time to read this closely. Haven't understood why it doesn't use OptimiserChain. Not sure that exposing `MixedPrecision{Float32}` as the official way to specify the...

Rule for mixed precision training

I have questions but haven't had time to read up more. First, this setup starts with the low-precision model, and temporarily stores a higher-precision copy for the purpose of accumulation...

Rule for mixed precision training

Just to sketch another possibility, Flux could instead wrap pairs of low+high precision copies of the same model: ```julia bimodel = MixedPrec(Float16, model32) # makes a Float16 copy, stores a...

Add `total(f, model)` to replace implicit `sum(f, Flux.params(model))`

Not sure I follow. If `fmap` were made differentiable, how would you write `total(norm, model)` using it?

Add `total(f, model)` to replace implicit `sum(f, Flux.params(model))`

OK. I guess this PR's take is that since essentially nothing else about Functors.jl is type-stable, everything takes a few μs, there's not much point pushing hard here. Decomposing into...

Add `total(f, model)` to replace implicit `sum(f, Flux.params(model))`

#143 and https://github.com/FluxML/Optimisers.jl/pull/57#issuecomment-1065598383 hint that the signature here should probably allow for `total(f, model, grads)` to mean roughly `sum(f(x,dx) for ...)`. That opens the thorny question of what happens to...