Zygote.jl icon indicating copy to clipboard operation
Zygote.jl copied to clipboard

Regression in comprehension support

Open ChrisRackauckas opened this issue 3 years ago • 12 comments

MWE:

using Zygote

function loss_adjoint(p)
    prediction = p.*rand(2,100)
    prediction = [prediction[:, i] for i in axes(prediction, 2)]
    sum(sum.(prediction))
end

Zygote.gradient(loss_adjoint, ones(2))
ERROR: MethodError: no method matching +(::Base.RefValue{Any}, ::NamedTuple{(:contents,), Tuple{Matrix{Float64}}})
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::ChainRulesCore.Tangent{P}, ::P) where P at C:\Users\accou\.julia\packages\ChainRulesCore\ctmSK\src\tangent_arithmetic.jl:146
  +(::VectorizationBase.CartesianVIndex, ::Any) at C:\Users\accou\.julia\packages\VectorizationBase\oCgEJ\src\cartesianvindex.jl:67
  ...
Stacktrace:
 [1] accum(x::Base.RefValue{Any}, y::NamedTuple{(:contents,), Tuple{Matrix{Float64}}})
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\lib\lib.jl:17
 [2] accum(x::Base.RefValue{Any}, y::NamedTuple{(:contents,), Tuple{Matrix{Float64}}}, zs::Nothing)
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\lib\lib.jl:22
 [3] Pullback
   @ c:\Users\accou\OneDrive\Computer\Desktop\test.jl:144 [inlined]
 [4] (::typeof(∂(loss_adjoint)))(Δ::Float64)
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\compiler\interface2.jl:0
 [5] (::Zygote.var"#60#61"{typeof(∂(loss_adjoint))})(Δ::Float64)
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\compiler\interface.jl:45
 [6] gradient(f::Function, args::Vector{Float64})
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\compiler\interface.jl:97
 [7] top-level scope
   @ c:\Users\accou\OneDrive\Computer\Desktop\test.jl:149

ChrisRackauckas avatar Aug 20 '22 17:08 ChrisRackauckas

You say regression, but not on which version this is known to work.

It appears to be one of the cases where re-using a variable name (for something with a different type) confuses Zygote. This is (IMO) a bad idea for humans too, and avoiding it avoids the problem:

julia> function loss_adjoint2(p)
           prediction = p.*rand(2,100)
           prediction2 = [prediction[:, i] for i in axes(prediction, 2)]
           sum(sum.(prediction2))
       end
loss_adjoint2 (generic function with 1 method)

julia> Zygote.gradient(loss_adjoint2, ones(2))
([52.379754440153576, 53.055517322087056],)

(Also, eachcol will be more efficient.)

mcabbott avatar Aug 20 '22 19:08 mcabbott

Testing in a fresh env with just Zygote, this works on 0.6.43. No other packages were changed between etsts. Looking into things now...

ToucheSir avatar Aug 20 '22 20:08 ToucheSir

That's more concerning. I got as far as trying 0.6.0... note that internally it's still working harder when the names are confusing, but does not accumulate a wrong answer:

(@v1.5) pkg> add [email protected]

julia> function loss_adjoint1(p)  # re-uses name like the question, but has well-defined answer
           prediction = p.*ones(2,100)
           prediction = [prediction[:, i] for i in axes(prediction, 2)]
           sum(sum.(prediction))
       end
loss_adjoint1 (generic function with 1 method)

julia> function loss_adjoint3(p)  # does not re-use name, 3x faster, same answer
           prediction = p.*ones(2,100)
           prediction3 = [prediction[:, i] for i in axes(prediction, 2)]
           sum(sum.(prediction3))
       end
loss_adjoint3 (generic function with 1 method)

julia> @btime Zygote.gradient(loss_adjoint1, ones(2))
  643.416 μs (3483 allocations: 464.59 KiB)
([100.0, 100.0],)

julia> @btime Zygote.gradient(loss_adjoint3, ones(2))
  121.750 μs (1218 allocations: 415.53 KiB)
([100.0, 100.0],)

mcabbott avatar Aug 20 '22 20:08 mcabbott

Reduced:

using Zygote

function loss_adjoint(p)
  prediction = 2p
  boxed_fn(i) = prediction^i
  # Trigger https://github.com/JuliaLang/julia/issues/15276
  prediction = boxed_fn(2)
  return prediction
end

Zygote.gradient(loss_adjoint, 1.0)

The error is coming from attempted accumulation of gradients for the inner closure (which contains a Core.Box). One is the expected (; prediction=Ref{Any}((; contents=...))), but the other has an unwrapped inner NamedTuple (; prediction=(; contents=...)).

With some logging in @adjoint literal_getproperty, we can get a clearer picture of what's going on:

┌ Info: getfield fwd
│   x = (::var"#boxed_fn#1") (generic function with 1 method)
│   f = :prediction
└   val = Core.Box(2.0)
┌ Info: getfield fwd
│   x = Core.Box(2.0)
│   f = :contents
└   val = 2.0
┌ Info: getfield fwd
│   x = Core.Box(4.0)
│   f = :contents
└   val = 4.0
┌ Info: getfield back mut
│   x = Core.Box(4.0)
│   Δ = 1.0
└   dx = Base.RefValue{Any}((contents = 1.0,))
┌ Info: getfield back mut
│   x = Core.Box(4.0)
│   Δ = 4.0
└   dx = Base.RefValue{Any}((contents = 4.0,))
┌ Info: getfield back imm
│   x = (::var"#boxed_fn#1") (generic function with 1 method)
│   Δ = Base.RefValue{Any}((contents = 4.0,))
│   dx = (prediction = Base.RefValue{Any}((contents = 4.0,)),)
└   _project(x, dx) = (prediction = (contents = 4.0,),)

_project is doing the recursive unwrapping of Refs in https://github.com/FluxML/Zygote.jl/blob/328eb4d122a12b0a6c4947e17278081e877169cf/src/compiler/chainrules.jl#L347.

Why did this work before? Prior to https://github.com/FluxML/Zygote.jl/pull/1248, Zygote was simply accumulating the Boxed closure field as an implicit param (even when they weren't being used) and returning nothing. This caused bugs with differentiating wrt. inner mutable types, e.g.:

julia> gradient(nt -> 2*nt.a.x, (; a=Ref(1.0)))
(nothing,)           # 0.6.43
((a = (x = 2.0,),),) # 0.6.44

Thinking about fixes, I feel _project round-tripping through CRC may be too lossy for this case. Perhaps we should have dedicated Z-Z projection machinery for mutable structs?

ToucheSir avatar Aug 20 '22 23:08 ToucheSir

I don't know whether my problem is the same, but it persists on 0.6.44 and 0.6.43.

function logvar(prob; ps_=prob.p, n=100)
    sum(msolve(prob, ps=ps_) for i in 1:n)
end

function dlogvar(prob, n=100)
    Zygote.gradient(ps_ ->logvar(prob; ps=ps_, n=n), prob.p)[1]
end

Here msolve essentially solves a Neural SDE and in dlogvar I want to differentiate wrt. the Lux model parameters ps. I cannot make out any reuse of names (even added the extra _ to make sure), but there is also something wrong with accum and NamedTuples:

MethodError: no method matching +(::Tuple{}, ::NamedTuple{(), Tuple{}})
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::VectorizationBase.CartesianVIndex, ::Any) at ~/.julia/packages/VectorizationBase/oCgEJ/src/cartesianvindex.jl:67
  +(::ChainRulesCore.Tangent{P}, ::P) where P at ~/.julia/packages/ChainRulesCore/ctmSK/src/tangent_arithmetic.jl:146
  ...
Stacktrace:
  [1] accum(x::Tuple{}, y::NamedTuple{(), Tuple{}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:17
  [2] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [3] accum(x::NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, y::NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
  [4] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [5] accum(x::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}, y::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
  [6] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [7] accum(x::NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}}}, y::NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}}}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
  [8] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [9] accum(x::NamedTuple{(:f, :rf), Tuple{NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}}}, Nothing}}, y::NamedTuple{(:f, :rf), Tuple{NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}}}}, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
 [10] Pullback
    @ ./reduce.jl:62 [inlined]
 [11] (::typeof(∂(_foldl_impl)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/compiler/interface2.jl:0

It works for n=2. Changing the comprehension for a loop doesn't help either

function logvar(prob; ps_=prob.p, n=100)
    #sum(msolve(prob, ps=ps_) for i in 1:n)
    x = 0.
    for i in 1:n
        x+=msolve(prob, ps=ps_)
    end
    x
end
MethodError: no method matching +(::Tuple{}, ::NamedTuple{(), Tuple{}})
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::VectorizationBase.CartesianVIndex, ::Any) at ~/.julia/packages/VectorizationBase/oCgEJ/src/cartesianvindex.jl:67
  +(::ChainRulesCore.Tangent{P}, ::P) where P at ~/.julia/packages/ChainRulesCore/ctmSK/src/tangent_arithmetic.jl:146
  ...
Stacktrace:
  [1] accum(x::Tuple{}, y::NamedTuple{(), Tuple{}})
    @ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:17
  [2] macro expansion
    @ ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27 [inlined]
  [3] accum(x::NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, y::NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27
  [4] macro expansion
    @ ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27 [inlined]
  [5] accum(x::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}, y::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27
  [6] Pullback
    @ ~/code/OptImpSampling.jl/src/logvar.jl:87 [inlined]

axsk avatar Aug 23 '22 12:08 axsk

@aksk this looks unrelated to me. This line

[3] accum(x::NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, y::NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}})

says that something is producing both (), nothing and (;), nothing as gradients for the same object, which has field names :data, :itr. That shouldn't be possible, but might be an easy bug to introduce by something like having two f(x::T...) methods and calling f() somewhere. If you have a MWE I'm sure this can be hunted down.

mcabbott avatar Aug 23 '22 13:08 mcabbott

In that case I am sorry for the noise and thank you for your suggestions. I checked my code (not a true MWE, but only 80 lines) for them but couldn't find anything the like :(

axsk avatar Aug 23 '22 15:08 axsk

StochasticDiffEq, SciMLSensitivity and Lux are a touch over 80 lines :P. You may have more luck rewriting the sum over generator to sum(_ -> msolve(prob, ps=ps_), 1:n). Usually this is both faster and less brittle.

ToucheSir avatar Aug 24 '22 02:08 ToucheSir

Just so https://github.com/FluxML/Zygote.jl/issues/1290#issuecomment-1221431340 doesn't get buried, here is a summary of the issue and proposal for how to address it.

For many rules, Zygote uses _project to try to harmonize input and differential types. That roundtrips through CRC.ProjectTo. One consequence of using _project is that it will recursively unwrap Zygote's own representation for mutable structural tangents (Ref{Any}(NamedTuple(...))) into just the wrapped NamedTuple. This makes a lot of sense for the user-facing interface, but if done between rules it can lead to scenarios where one unwraps immutabletangent(mutabletangent(...)) -> immutabletangent(immutabletangent(...)) and tries to accumulate it with an unwrapped tangent.

Why was this not showing up before? Simply speaking, Zygote was throwing away a lot of gradients of mutable types because they took the same code path as implicit params (shoved in the cache and summarily ignored). That doesn't seem to have lead to major issues, but it is incorrect if you wanted said gradients at any point. 0.6.44 fixed this by not hitting the implicit param path unless one specifically asks for it, but it turned up this unforeseen edge case in _project.

What can be done? I think the most straightforward solution is to have separate projection routines for what is used at the user <-> AD boundary (i.e. pullback) and what is used internally. Alternative ideas are very much welcome too. The only constraint I can think of is that gradient must never return Ref{Any} at the top-level or nested inside the grads unless the original argument had a Ref there as well.

ToucheSir avatar Sep 09 '22 00:09 ToucheSir

Is https://github.com/FluxML/Zygote.jl/issues/1304 the same?

ChrisRackauckas avatar Sep 09 '22 00:09 ChrisRackauckas

Most likely, yes. Since that doesn't involve Core.Box though, it's hard to say for certain without a MWE.

ToucheSir avatar Sep 09 '22 00:09 ToucheSir

If the motivation is to unify the return types to the user then we can unwrap refs as a last pass. The "implicit params code paths" would be something to investigate, do you have specific instances of those in mind?

Adding projection primitives is one way, but it's not clear whether they solve the issue. It's better to have conversion rules that Julia can use instead.

On Fri, Sep 9, 2022, 06:24 Brian Chen @.***> wrote:

Most likely, yes.

— Reply to this email directly, view it on GitHub https://github.com/FluxML/Zygote.jl/issues/1290#issuecomment-1241379704, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJOZVVOEOZ6TPES5LNZHA5LV5KDF5ANCNFSM57DQE6OQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

DhairyaLGandhi avatar Sep 09 '22 09:09 DhairyaLGandhi