ForwardDiff.jl icon indicating copy to clipboard operation
ForwardDiff.jl copied to clipboard

derivative of `norm` at 0

Open goretkin opened this issue 7 years ago • 4 comments

norm is not differentiable at 0, so at best you can return a subgradient. It appears that the subgradient is 1.0 at 0.0 (and -1.0 at -0.0).

julia> ForwardDiff.gradient(norm, [0.0, 0.0])
2-element Array{Float64,1}:
 0.0
 1.0

julia> ForwardDiff.gradient(norm, [0.0, -0.0])
2-element Array{Float64,1}:
 -0.0
 -1.0

I'm wondering if it would be worth it to define Base.norm on ForwardDiff.Dual, and return a subgradient of 0.0 at both 0.0 and -0.0

Also perhaps I missed this, but I think it would be nice to mention somewhere that in generic auto-diffable code sqrt(sum(v.^2)) should be replaced with norm, since sqrt is singular at 0, and produces a NaN when composed with a function with 0 gradient (0*Inf = NaN).

goretkin avatar Jul 17 '17 17:07 goretkin

Here is an interesting effect that I am guessing is related?

using ForwardDiff, StaticArrays
# - ForwardDiff                   0.7.3
# - StaticArrays                  0.6.6
u = x ->  (1.0 + norm(x)^2)^(-1/2)
∇u = x -> ForwardDiff.gradient(u, x)
∇u(zeros(2))
# 2-element Array{Float64,1}:
#  -0.0
#  -0.0
∇u(@SVector zeros(2))
# 2-element SVector{2,Float64}:
#  NaN
#  NaN

(u = x -> (1.0 + sum(x.^2))^(-1/2) works fine for both)

cortner avatar Mar 02 '18 15:03 cortner

Here's something to think about related to this issue:

using ForwardDiff
using LinearAlgebra

# start with zero valued vector of dual numbers
v = zeros(ForwardDiff.Dual{Nothing, Float64, 1}, 3);

# assume a perturbation of one component exists due to some computational noise
value = 1.0e-200 # so that value^2 == 0.0 (due to machine precision)
partial = 1.0e-100 # so that 2*value*partial != 0.0 (due to machine precision)
v[1] = ForwardDiff.Dual{Nothing}(value, partial);

# try out the two methods
norm(v)
# Dual{Nothing}(1.0e-200,NaN)
sqrt(sum(v.^2))
# Dual{Nothing}(0.0,Inf)

Both implementations will result in NaNs propagated throughout the function, even in NaN-safe mode. I encountered this when propagating derivatives through a Newton solve and it took me a lot of time to find the issue.

taylormcd avatar Oct 08 '19 02:10 taylormcd

(...) and it took me a lot of time to find the issue.

Could you please share how you ended up working around it?

ferrolho avatar Jan 22 '20 12:01 ferrolho

I created a new issue related to my comment as it is somewhat tangential to this issue. I'll post my workaround there.

taylormcd avatar Apr 13 '21 17:04 taylormcd