Michael Abbott

Results 1143 comments of Michael Abbott

I commented the method out just to focus on getting one thing working first. I believe it still needs tests, but otherwise this is nearly done. Maybe I should check...

Would like to ask for some time to read this closely. Haven't understood why it doesn't use OptimiserChain. Not sure that exposing `MixedPrecision{Float32}` as the official way to specify the...

I have questions but haven't had time to read up more. First, this setup starts with the low-precision model, and temporarily stores a higher-precision copy for the purpose of accumulation...

Just to sketch another possibility, Flux could instead wrap pairs of low+high precision copies of the same model: ```julia bimodel = MixedPrec(Float16, model32) # makes a Float16 copy, stores a...

Not sure I follow. If `fmap` were made differentiable, how would you write `total(norm, model)` using it?

OK. I guess this PR's take is that since essentially nothing else about Functors.jl is type-stable, everything takes a few μs, there's not much point pushing hard here. Decomposing into...

#143 and https://github.com/FluxML/Optimisers.jl/pull/57#issuecomment-1065598383 hint that the signature here should probably allow for `total(f, model, grads)` to mean roughly `sum(f(x,dx) for ...)`. That opens the thorny question of what happens to...

> But in the general case, it should apply the reduction operator supplied? Thinking more, I think do think they are always added. If I do something like `mapreduce(norm, max,...

No, it's not... I suspect you have Base.isnumeric, there's an unfortunate name clash? Or perhaps this has just rotted, sorry. I mean to revisit but have been busy.

Now rebased, but the tests tell me I need to remember what on earth `_Tangent_biwalk` does.