Zygote.jl icon indicating copy to clipboard operation
Zygote.jl copied to clipboard

nested gradient with hessian

Open YichengDWu opened this issue 1 year ago • 16 comments

Reverse on forward on reverse:

julia> function f1(x, ps)  # [edit: renamed not to clash]
       hess = Zygote.hessian(x->sum(x.^3), x)
       return hess * x .+ ps.bias
       end
f1 (generic function with 1 method)

julia> x = rand(3)
3-element Vector{Float64}:
 0.9750274052932691
 0.015723824729741764
 0.9792305251283961

julia> ps = (;bias = rand(3))
(bias = [0.6184575461887033, 0.24789621977449272, 0.5451996227584986],)

julia> f1(x,ps)
3-element Vector{Float64}:
 6.322528192626253
 0.24937965175928256
 6.298554150817905

julia> Zygote.gradient(p -> sum(f1(x,p)), ps)
ERROR: Mutating arrays is not supported -- called setindex!(Matrix{Float64}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/dev/limitations.html#Array-mutation-1

Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:33
  [2] _throw_mutation_error(f::Function, args::Matrix{Float64})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:70
  [3] (::Zygote.var"#444#445"{Matrix{Float64}})(#unused#::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:82
  [4] (::Zygote.var"#2496#back#446"{Zygote.var"#444#445"{Matrix{Float64}}})(Δ::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:67
  [5] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:31 [inlined]
  [6] (::typeof(∂(forward_jacobian)))(Δ::Tuple{Nothing, Matrix{Float64}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
  [7] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:44 [inlined]
  [8] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:43 [inlined]
  [9] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:76 [inlined]
 [10] (::typeof(∂(hessian_dual)))(Δ::Matrix{Float64})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [11] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:74 [inlined]
 [12] Pullback
    @ .\REPL[119]:2 [inlined]
 [13] (::typeof(∂(f)))(Δ::FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [14] Pullback
    @ .\REPL[123]:1 [inlined]
 [15] (::typeof(∂(#97)))(Δ::Float64)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [16] (::Zygote.var"#60#61"{typeof(∂(#97))})(Δ::Float64)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:41
 [17] gradient(f::Function, args::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:76
 [18] top-level scope
    @ REPL[123]:1

Forward on forward on reverse

julia> Zygote.forward_jacobian(p -> sum(f1(x,p)), ps)
ERROR: MethodError: no method matching size(::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
Closest candidates are:
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:567
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}, ::Integer) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:566
  size(::Union{LinearAlgebra.Cholesky, LinearAlgebra.CholeskyPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\cholesky.jl:494
  ...
Stacktrace:
 [1] seed(x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}, ::Val{1}, offset::Int64) (repeats 2 times)
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:7
 [2] forward_jacobian(f::var"#99#100", x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}, #unused#::Val{1})
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:29
 [3] forward_jacobian(f::Function, x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}; chunk_threshold::Int64)
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:44
 [4] forward_jacobian(f::Function, x::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:43
 [5] top-level scope
   @ REPL[124]:1

It would be great to support the second mode. Looks like it won't take too much to achieve that. If I change ps to a vector it can work smoothly.

julia> function f2(x, bias)
       hess = Zygote.hessian(x->sum(x.^3), x)
       return hess * x .+ bias
       end
f2 (generic function with 1 method)

julia> Zygote.forward_jacobian(p -> sum(f2(x,p)), rand(3))
(13.320223875130782, [1.0; 1.0; 1.0;;])

YichengDWu avatar Jul 17 '22 23:07 YichengDWu

It boils down to generalizing Zygote.seed to NamedTuple

julia> Zygote.seed(ps,Val(12))
ERROR: MethodError: no method matching size(::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
Closest candidates are:
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:567
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}, ::Integer) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:566
  size(::Union{LinearAlgebra.Cholesky, LinearAlgebra.CholeskyPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\cholesky.jl:494
  ...
Stacktrace:
 [1] seed(x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}, ::Val{12}, offset::Int64) (repeats 2 times)
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:7
 [2] top-level scope
   @ REPL[160]:1
 [3] top-level scope
   @ C:\Users\Luffy\.julia\packages\CUDA\tTK8Y\src\initialization.jl:52

YichengDWu avatar Jul 18 '22 02:07 YichengDWu

So this is another bug i guess

julia> function f(x, bias)
              jac = Zygote.jacobian(x->x.^3, x)[1]
              return jac * x .+ bias
              end
f (generic function with 1 method)

julia> x,bias = rand(3),rand(3)
([0.2279638899624825, 0.6476786632858718, 0.13745627655377346], [0.051516386842686224, 0.6842360463718182, 0.22031281411507742])

julia> Zygote.gradient(b -> sum(f(x,b)), rand(3))
ERROR: Mutating arrays is not supported -- called copyto!(SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/dev/limitations.html#Array-mutation-1

Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:33
  [2] _throw_mutation_error(f::Function, args::SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:70
  [3] (::Zygote.var"#448#449"{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}})(#unused#::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:85
  [4] (::Zygote.var"#2506#back#450"{Zygote.var"#448#449"{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}}})(Δ::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:67
  [5] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:183 [inlined]
  [6] (::typeof(∂(_gradcopy!)))(Δ::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
  [7] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:165 [inlined]
  [8] (::typeof(∂(withjacobian)))(Δ::NamedTuple{(:val, :grad), Tuple{Nothing, Tuple{Matrix{Float64}}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
  [9] (::Zygote.var"#216#217"{Tuple{Tuple{Nothing}, Tuple{Nothing}}, typeof(∂(withjacobian))})(Δ::NamedTuple{(:val, :grad), Tuple{Nothing, Tuple{Matrix{Float64}}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\lib.jl:207
 [10] #1909#back
    @ C:\Users\Luffy\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:67 [inlined]
 [11] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:140 [inlined]
 [12] (::typeof(∂(jacobian)))(Δ::Tuple{Matrix{Float64}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [13] Pullback
    @ .\REPL[20]:2 [inlined]
 [14] (::typeof(∂(f)))(Δ::FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [15] Pullback
    @ .\REPL[22]:1 [inlined]
 [16] (::typeof(∂(#23)))(Δ::Float64)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [17] (::Zygote.var"#60#61"{typeof(∂(#23))})(Δ::Float64)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:41
 [18] gradient(f::Function, args::Vector{Float64})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:76
 [19] top-level scope
    @ REPL[22]:1
 [20] top-level scope
    @ C:\Users\Luffy\.julia\packages\CUDA\DfvRa\src\initialization.jl:52

YichengDWu avatar Jul 19 '22 21:07 YichengDWu

Zygote's jacobian function isn't Zygote-differentiable. There's no major barrier to making it so, someone just has to do it. I think the most relevant issue for this is https://github.com/FluxML/Zygote.jl/issues/953 . That's pure-Zygote, unlike hessian which involves ForwardDiff... maybe this issue can be about that alone?

mcabbott avatar Jul 19 '22 22:07 mcabbott

Mathematically, I did not differentiate the jacobian itself. The Jacobian here should be treated as a constant. This could be a dumb question but why can't Zygote tell from the input of Zygote.jacobian to avoid redundant differentiation?

YichengDWu avatar Jul 19 '22 22:07 YichengDWu

Zygote does not know this, unfortunately. It works backwards from the final result, in complete ignorance of which branches of the expression tree ultimately lead to gradients you do or do not want.

mcabbott avatar Jul 19 '22 22:07 mcabbott

More specifically, Zygote will run the AD transform on any function which doesn't have an existing rrule/@adjoint (N.B: @nograd and @non_differentiable generate these automatically), and generate an output which is used in the primal computation. For your particular case, ChainRulesCore provides a manual stop gradient operation via https://juliadiff.org/ChainRulesCore.jl/stable/api.html#ChainRulesCore.@ignore_derivatives and co.

ToucheSir avatar Jul 19 '22 22:07 ToucheSir

Ok, I get it now. What also confuses me is the error message here. It Mutating arrays is not supported not no adjoint for Zygote.jacobian. Quite misleading to me.

YichengDWu avatar Jul 19 '22 22:07 YichengDWu

Just because Zygote fails on a function due to it using mutation does not mean the solution is to write an adjoint for it. Alternatives may include rewriting the offending bits to not use mutation, using Buffer, marking the function or a function higher up the call stack @non_differentiable, writing a custom rule for a function higher up the stack, etc.

ToucheSir avatar Jul 19 '22 22:07 ToucheSir

In essence, It looks like the same kind of failure as the reverse on forward on reverse given that they have similar error messages.

For forward on forward on reverse, after adding a method to Zygote.seed it is working fine https://github.com/jonniedie/ComponentArrays.jl/issues/149

YichengDWu avatar Jul 19 '22 22:07 YichengDWu

no adjoint for Zygote.jacobian. Quite misleading

The reason for this is that looking inside unknown functions is literally what it does for a living. It doesn't stop at the call to Zygote.jacobian because this is nothing special; it hopes to successfully figure out what happens inside. And it fails because the function makes an array & writes into it.

Many failures thus have the same error message. Zygote over ForwardDiff is its own ball of problems besides mutation.

mcabbott avatar Jul 19 '22 22:07 mcabbott

Ah you're right, just a little whining

YichengDWu avatar Jul 19 '22 23:07 YichengDWu

I have a piece of code that involves multiple errors and it took me a long time to isolate each one 🥲

YichengDWu avatar Jul 19 '22 23:07 YichengDWu

Oh I know. My only advice is to start small and add things...

I've marked the jacobian posts "duplicate" since these are exactly #1268 now.

mcabbott avatar Jul 19 '22 23:07 mcabbott

Should I change the title of this issue to "hessian is not differentiable"?

YichengDWu avatar Jul 19 '22 23:07 YichengDWu

There are two hessian functions, and they are very short: https://github.com/FluxML/Zygote.jl/blob/7f2b16987dc4eab236aabd889444cabdcc6f9caf/src/lib/grad.jl#L74-L89

The all-zygote one would in principal be differentiable, if #1268 were solved.

julia> function f3(x, ps)
         hess = Zygote.hessian_reverse(x->sum(abs2, x.^3), x)
         return hess * x .+ ps.bias
       end
f3 (generic function with 1 method)

julia> Zygote.gradient(p -> sum(f3(x,p)), ps)
ERROR: Mutating arrays is not supported -- called copyto!(SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, ...)

However, 3rd order derivatives using Zygote are unlikely to ever be a good idea. I think the tests contain one like this, and it is very very slow:

julia> @time @eval sin'''(1.0)
120.620770 seconds (93.61 M allocations: 5.603 GiB, 13.76% gc time, 99.81% compilation time)
-0.5403023058681398

mcabbott avatar Jul 19 '22 23:07 mcabbott

Sure it's slow. Forward over forward over reverse is more reasonable. So this goes back to supporting Zygote.seed(x::NamedTuple,...)

YichengDWu avatar Jul 19 '22 23:07 YichengDWu