Michael Abbott
Michael Abbott
Oh right, thanks both. I guess there are many details of this union I don't see yet. But Functors/Optimisers are interested in AD with nested structs, and there might be...
I guess cases like this are the motivation for treating `Fill` as being one number: ```julia julia> gradient(y -> sum(sqrt, gradient(x -> sum(x.^2), y)[1]), reshape(1:6,2,3))[1] 2×3 Matrix{Float64}: 0.707107 0.408248 0.316228...
Maybe the short version is that I'd like more examples of where `Fill` gets used. Your argument is that it must be treated as structurally constant, and while I agree...
> for me, one basic requirement of a reverse-mode AD system is that a single reverse-pass should have the same asymptotic time-complexity as the forwards-pass Perhaps it's useful to back...
Trying #962 on this (CPU): ``` julia> @btime gradient(x -> sum(abs2, net(x)), $(rand(50,50,50,50))); 1.133 s (8266 allocations: 3.07 GiB) # master 618.508 ms (8120 allocations: 524.94 MiB) # first version...
I believe that commit should be safe, give correct answers. Great that it helps much more in your timing than in mine, labelled "without mutation" above. I think that all...
OK, this is a real model! I'm curious whether #962 (i.e. Zygote#master, right now) and #981 make any difference? My guess is that they won't, since every weight is used...
Same as #1247 I think, runs if you add https://github.com/FluxML/Zygote.jl/issues/1247#issuecomment-1175332642
Following up on @mzgubic's simplification, here is a more minimal example, and one without a second derivative at all. The problem appears to be that Zygote is getting confused by...
ForwardDiff's Dual numbers should work with complex numbers, but the way they are produced and consumed would need to change. The functions are a bit sloppy, they add a dual...