Regression in 1.0, asking for Hessian of a complex matrix function results in infinite loop
I've encountered this issue when running the tests of my package ConicQKD, which started to hang when I switched from ForwardDiff 0.10 to 1.0.
It was a lot of work to reduce it to a minimum working example. Apparently the problem appears when you try to compute the Hessian of a function of a complex matrix that sets several elements to zero and asks for its eigenvalues. If I just ask for the eigenvalues without setting anything to zero then it's fine. Real matrices work without any problems, even when setting elements to zero.
To reproduce, call test_hessian(ComplexF64). You can also check the real case test_hessian(Float64), it works fine.
using LinearAlgebra
import ForwardDiff
import GenericLinearAlgebra.eigen
import Random
function test_hessian(::Type{T}) where {T}
Random.seed!(1)
d = 3
function barrier(point)
d2 = div(d, 2)
M = mat(point, T)
M[1:d2, d2+1:end] .= 0
return sum(eigvals(Hermitian(M)))
end
x = randn(T, d, d)
point = rvec(x * x')
fd_hess = ForwardDiff.hessian(barrier, point)
return fd_hess
end
function rvec(M::AbstractMatrix{T}) where {T}
if T <: Real
return vec(M)
else
return [real(vec(M)); imag(vec(M))]
end
end
function mat(v::AbstractVector, ::Type{T}) where {T}
if T <: Real
n = isqrt(length(v))
return reshape(v, n, n)
else
l2 = div(length(v), 2)
n = isqrt(l2)
return reshape(v[1:l2] + im*v[l2+1:end], n, n)
end
end
If you set d=2 in the above example you get NaN instead of an infinite loop, which might be easier to debug.
@andreasnoack perhaps you would know what is happening?
I'd love to help debugging this but I don't know anything about autodiff in general and ForwardDiff in particular.
IMO the main problem (which became apparent in recent releases but has existed for a long time) is that there exists a definition of eigvals(::Hermitian{<:Complex{<:ForwardDiff.Dual}}) (https://github.com/JuliaDiff/ForwardDiff.jl/blob/d81302372bc812fbc7f3e2eafa17b4b8f570d8ae/src/dual.jl#L751-L755) but not of eigen(::Hermitian{<:Complex{<:ForwardDiff.Dual}}}). Currently, only https://github.com/JuliaDiff/ForwardDiff.jl/blob/d81302372bc812fbc7f3e2eafa17b4b8f570d8ae/src/dual.jl#L787 and https://github.com/JuliaDiff/ForwardDiff.jl/blob/d81302372bc812fbc7f3e2eafa17b4b8f570d8ae/src/dual.jl#L795-L800 are defined.
I believe the definition in ForwardDiff should just be an optimization, and that going through the one in GenericLinearAlgebra should still work. The problem is with the NaNs, though. Because they always return false in comparisons, the loop never terminates. The algorithm shouldn't create any NaNs but the Duals not cause NaNs. I'd have to take a closer look.
The NaNs seem to be created when the matrix is converted to real symmetric tridiagonal form. When d=2, the matrix is diagonal with (1,2) element
julia> tmp[1,2]
Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0)) + Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0))*im
During the transformation, LinearAlgebra.reflector! is called on tmp[1,2:2] where norm is computed, see https://github.com/JuliaLang/julia/blob/v1.11.6/stdlib/LinearAlgebra/src/generic.jl#L1593. It ends up calling norm on the single element, which is
julia> norm(tmp[1,2])
Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(0.0,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),Dual{ForwardDiff.Tag{var"#13#14"{ComplexF64, Int64}, Float64}}(NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN))
There is then a test of zero, to avoid scaling with a zero, see https://github.com/JuliaLang/julia/blob/v1.11.6/stdlib/LinearAlgebra/src/generic.jl#L1594-L1596. I guess that is what is now returning false where it maybe used to return true. I'm not completely sure how to handle this case according to the new ForwardDiff principles.
That seems like a pretty fundamental problem, unrelated to GenericLinearAlgebra. Plenty of code branches on whether some value is 0 or 1 or whatever. Is ForwardDiff suppose to fail in those cases now? For example:
using ForwardDiff
function barrier(point)
x, y = point
if x == 0
return y^2
else
return y^3
end
end
function analytical_gradient(point)
x, y = point
if x == 0
return [0, 2y]
else
return [0, 3y^2]
end
end
function test_zero(p0)
return ForwardDiff.gradient(barrier, p0), analytical_gradient(p0)
end
test_zero([0,1]) gives the correct answer on 0.10, but not on 1.0.
I'd suggest reading though https://github.com/JuliaDiff/ForwardDiff.jl/issues/480 and adding your comment there instead of rehashing the discussion here again.
So the answer is yes? ForwardDiff is supposed to fail in those cases now? Then please say so, instead of asking me to read a huge discussion that I have no interest in.
Can this regression be fixed after all? Or do I need to look for another autodiff package? That's a real problem. I tried Zygote, Mooncake, and Enzyme, and all three computed correctly the gradient for the toy example test_zero, but none of them could handle my original case because they don't seem to support differentiating through eigenvalues.
IMO the example is misleading/ill-posed and ForwardDiff 0.10, Zygote, Mooncake and Enzyme are incorrect (as well): The function $f: \mathbb{R}^2 \to \mathbb{R}$ of the second output component is not differentiable at $(0, 1)$ and hence the gradient at $(0, 1)$ does not exist.
The partial derivatives at $(0, 1)$ are $\partial_x f(0, 1) = 0$ and $\partial_y f(0, 1) = 2$. However, the function is not differentiable at $(0, 1)$ as the limit
$$ \lim_{(x, y) \to (0, 1)} \frac{f(x, y) - f(0, 1) - \partial_x f(0, 1) \cdot (x - 0) - \partial_y f(0, 1) \cdot (y - 1)}{| (x, y) - (0, 1)|} = \lim_{(x, y) \to (0, 1)} \frac{f(x, y) - 1 - 0 \cdot (x - 0) - 2 \cdot (y - 1)}{| (x, y) - (0, 1)|} $$
is not zero, e.g., it evaluates to
$$ \lim_{h \to 0} \frac{{(1 + h)}^3 - 1 - 2h}{\sqrt{2}|h|} = \lim_{h \to 0} \frac{h^3 + 3h^2 + h}{\sqrt{2}|h|} \neq 0 $$
on the path $(h, 1 + h) \to (0, 1)$.
You can make the function differentiable at $(0, 1)$ by e.g. changing its definition to
$$ f(x, y) = \begin{cases} y^2 & \text{if } |x| < \epsilon,\ y^3 & \text{otherwise}, \end{cases} $$
where $\epsilon > 0$ is some (possibly arbitrarily small) number as then
$$ \lim_{(x, y) \to (0, 1)} \frac{f(x, y) - 1 - 0 \cdot (x - 0) - 2 \cdot (y - 1)}{| (x, y) - (0, 1)|} = \lim_{(x, y) \to (0, 1)} \frac{{(y - 1)}^2}{\sqrt{x^2 + {(y - 1)}^2}} = 0, $$
regardless of how the limit is taken. Then the gradient exists at $(0, 1)$, and ForwardDiff 1 computes it correctly:
julia> using ForwardDiff
julia> function f(xy)
x, y = xy
return abs(x) < eps() ? y^2 : y^3
end;
julia> ForwardDiff.gradient(f, [0, 1])
2-element Vector{Int64}:
0
2
BTW ForwardDiff 1 is computing the partial derivatives correctly if you request them instead of the gradient:
julia> function f(xy)
x, y = xy
return iszero(x) ? y^2 : y^3
end;
julia> ForwardDiff.derivative(x -> f(vcat(x, 1)), 0)
0
julia> ForwardDiff.derivative(y -> f(vcat(0, y)), 1)
2
Can this regression be fixed after all?
The current behavior is more correct. Just as an indication, you can see the number of issues of wrong gradients that were fixed in https://github.com/JuliaDiff/ForwardDiff.jl/pull/481.
The problem in your case is more a garbage in garbage out situation (as the comment above here showed)
There's no need to insult me. The toy example I gave was just for me to understand what I was going on, it is not relevant. The real example is the one I opened the issue with. It's not "garbage". The Hessian is perfectly well-defined, and ForwardDiff should be able to compute it.
I'd suggest reading where that exact example was discussed instead of rehashing the discussion here again :)
I edited my comment to remove the toy example. I don't care about them. I don't want to read anything. I just wanted to know if the regression can be fixed. It seems that the answer is a clear no.
It can surely be made to work, and people are generously donating time here towards trying to isolate it. Part of that, unfortunately, does involve reading to understand the issue.
It's also not yet entirely obvious this is a regression, as there's some chance the old behaviour was silently wrong.
The NaNs seem to be created [...] ends up calling
norm
Thanks for digging. Maybe worth noting the behaviours of norm:
julia> ForwardDiff.derivative.(norm, (-0.1, 0.0, 0.1))
(-1.0, 1.0, 1.0)
julia> ForwardDiff.gradient.(norm, ([-0.1], [0.0], [0.1]))
([-1.0], [1.0], [1.0]) # v0.10
([-1.0], [NaN], [1.0]) # v1
julia> ForwardDiff.hessian.(norm, ([-0.1], [0.0], [0.1]))
([0.0;;], [0.0;;], [0.0;;]) # v0.10
([0.0;;], [NaN;;], [0.0;;]) # v1
As you say, I think these changes are driven branches within its implementation (it's trying to avoid underflow?). The gradient undefined at 0 but in AD it's usually better to pick one (a subgradient?) rather than to return NaN. As is done in the scalar case, and for instance ForwardDiff.derivative.(abs, (-0.1, 0.0, 0.1)).
The old behaviour was definitely correct, because I was checking it against the Hessian computed analytically.
The old behaviour was definitely correct,
I feel you are missing the forest because of the tree. Just because there is one case that worked with the old behavior that does not work with the new behavior does not mean that the old behavior was correct or better. Many many other derivatives that were wrong are now fixed with the new behavior.
This does not mean that the case that currently fails should not be fixed (see https://github.com/JuliaDiff/ForwardDiff.jl/pull/757). But can we focus on that one instead of griping about https://github.com/JuliaDiff/ForwardDiff.jl/pull/481 in general?
You misunderstand me. I have zero interest in arguing about #481. By "regression" I mean my code that broke. When I say "the old behaviour was correct" I just mean that it computed the Hessian correctly in my case.
My point about regression was that we only have your word on the old thing being correct, for some example. You will forgive me for taking it with a pinch of salt!
Formulating the eigenvalue calculation in a way that makes it easy to compare ForwardDiff to finite differences, and providing the input in a way that doesn't depend on version-specific RNG, would make it easier to check. And might provide a useful case for the tests.
It does seem likely that if the old code tests norm(x)==0, then it can be made to produce wrong answers, alla #481, for some input matrices.
No, I won't forgive you. You think that both ForwardDiff 0.10 and my analytical Hessian might be wrong, and for some mysterious reason agree with eachother? And also with the result of the new PR? That's not only implausible, that's a gratuitous insult.
I derived the analytical formula in this peer-reviewed paper. You can check the derivation yourself, it's not difficult. Checking that my code in fact implements the formula is not so straightforward, but it's easy write non-optimized code implementing it. Then you can use it to check my code, the old ForwardDiff, and the new PR.