Enzyme.jl icon indicating copy to clipboard operation
Enzyme.jl copied to clipboard

Zero gradient for product over ForwardDiff.Dual numbers

Open michel2323 opened this issue 3 years ago • 6 comments

The following example tests reduce(*,x) with

  1. Enzyme
  2. Enzyme + ForwardDiff.Dual with no tangent.
  3. Enzyme + ForwardDiff.Dual with 1 tangent
  4. Enzyme through a KA kernel with KernelGradients (no ForwardDiff.Dual)

3+4. print failures error1.txt and error2.txt, respectively. Setup (with latest Enzyme main):

  [7da242da] Enzyme v0.10.3
  [f6369f11] ForwardDiff v0.10.30
  [63c18a36] KernelAbstractions v0.8.3
  [e5faadeb] KernelGradients v0.1.2
  [8dfed614] Test
using ForwardDiff
using KernelAbstractions
using KernelGradients
using Enzyme
using Test

function speelpenning(x)
    reduce(*, x)
end


function test_gradient(x::Vector{Float64}, gradient::Vector{Float64})
    errg = 0.0
    y = speelpenning(x)
    for (i, v) in enumerate(x)
        errg += abs(dx[i]-y/v)
    end
    @test isapprox(errg, 0, atol=1e-15)
    return errg
end


@kernel function speelpenning_kernel(y, x)
    I = @index(Global, Linear)

    if I == 1
        for v in x
            y[1] *= v
        end
    end
    @synchronize
end

n = 16
nthreads = 4
x = [i/(1.0+i) for i in 1:n]
y = 1.0
arr_y = [y]
red = ones(eltype(x), nthreads)
speelpenning_ka = speelpenning_kernel(CPU(), nthreads, n)
wait(speelpenning_ka(arr_y, x))
println("Speelpenning(x): ", arr_y[1])
y = speelpenning(x)
isapprox(arr_y[1], speelpenning(x))

dx = zeros(n)
autodiff(speelpenning, Duplicated(x,dx))
# FIRST CHECK
test_gradient(x, dx)
dual_x = ForwardDiff.Dual.(x)
dual_dx = ForwardDiff.Dual.(dx)
# Works an returns correct values
autodiff(speelpenning, Duplicated(dual_x, dual_dx))
# SECOND CHECK
test_gradient(ForwardDiff.value.(dual_x), ForwardDiff.value.(dual_dx))
# Now with a tangent attached
dx = zeros(n)
seeds = zeros(n)
dual_x = ForwardDiff.Dual.(x, seeds)
dual_dx = ForwardDiff.Dual.(dx, seeds)
# Breaks with ForwardDiff.Dual numbers with a tangent. Output in error1 file
autodiff(speelpenning, Duplicated(dual_x, dual_dx))
# THIRD CHECK
test_gradient(ForwardDiff.value.(dual_x), ForwardDiff.value.(dual_dx))


println("$(y-1/(1.0+n)) error in function")
println("$(test_gradient(x, dx)) error in gradient")

arr_y = ones(1)
darr_y = ones(1)

dx = zeros(n)

# Does work now
wait(autodiff(speelpenning_ka)(Duplicated(arr_y, darr_y), Duplicated(x, dx)))
# FOURTH CHECK
test_gradient(x, dx)

michel2323 avatar Nov 04 '21 19:11 michel2323

Is this still an issues on current Enzyme?

vchuravy avatar Jun 26 '22 18:06 vchuravy

Updated the example. The KernelGradients.jl one had mistakes and returns a correct gradient now.

The third check still fails. However, not with an error. It just returns a zero gradient, which I believe should not be the case. The second check is the same code just without a tangent attached. Attaching a tangent should not change the gradient in the vector.

michel2323 avatar Jun 27 '22 23:06 michel2323

So the third check is without KA?

using ForwardDiff
using Enzyme
using Test

function speelpenning(x)
    reduce(*, x)
end

function test_gradient(x::Vector{Float64}, gradient::Vector{Float64})
    errg = 0.0
    y = speelpenning(x)
    for (i, v) in enumerate(x)
        errg += abs(dx[i]-y/v)
    end
    @test isapprox(errg, 0, atol=1e-15)
    return errg
end

n = 16
x = [i/(1.0+i) for i in 1:n]
y = speelpenning(x)
@show isapprox(y, 1.0)

dx = zeros(n)
seeds = zeros(n)
dual_x = ForwardDiff.Dual.(x, seeds)
dual_dx = ForwardDiff.Dual.(dx, seeds)
# Breaks with ForwardDiff.Dual numbers with a tangent. Output in error1 file
autodiff(speelpenning, Duplicated(dual_x, dual_dx))
# THIRD CHECK
test_gradient(ForwardDiff.value.(dual_x), ForwardDiff.value.(dual_dx))

println("$(test_gradient(x, dx)) error in gradient")

So:

julia> autodiff(speelpenning, Active{Float64}, Duplicated(x, dx))

julia> dx
16-element Vector{Float64}:
 0.11764705882352941
 0.08823529411764706
 0.0784313725490196
 0.07352941176470588
 0.07058823529411765
 0.06862745098039216
 0.06722689075630253
 0.06617647058823531
 0.06535947712418302
 0.0647058823529412
 0.06417112299465243
 0.06372549019607844
 0.06334841628959277
 0.06302521008403364
 0.06274509803921571
 0.06250000000000003
julia> autodiff(speelpenning, Active{Float64}, Duplicated(dual_x, dual_dx))
()

julia> dual_dx
16-element Vector{ForwardDiff.Dual{Nothing, Float64, 1}}:
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)
 Dual{Nothing}(0.0,0.0)

julia> dual_x
16-element Vector{ForwardDiff.Dual{Nothing, Float64, 1}}:
 Dual{Nothing}(0.5,0.0)
 Dual{Nothing}(0.6666666666666666,0.0)
 Dual{Nothing}(0.75,0.0)
 Dual{Nothing}(0.8,0.0)
 Dual{Nothing}(0.8333333333333334,0.0)
 Dual{Nothing}(0.8571428571428571,0.0)
 Dual{Nothing}(0.875,0.0)
 Dual{Nothing}(0.8888888888888888,0.0)
 Dual{Nothing}(0.9,0.0)
 Dual{Nothing}(0.9090909090909091,0.0)
 Dual{Nothing}(0.9166666666666666,0.0)
 Dual{Nothing}(0.9230769230769231,0.0)
 Dual{Nothing}(0.9285714285714286,0.0)
 Dual{Nothing}(0.9333333333333333,0.0)
 Dual{Nothing}(0.9375,0.0)
 Dual{Nothing}(0.9411764705882353,0.0)

vchuravy avatar Jun 30 '22 21:06 vchuravy

I think the issue here is that since the dual returns two values it is ambiguous what return you are differentiating with respect to. In fact, since you didn't specify the activity of the return, it here assumes it to be const (hence the zeros). See the following code where you can explicitly set what you want to differentiate with respect to:

using ForwardDiff
using Enzyme
using Test

# Enzyme.API.printall!(true)

function speelpenning(x)
    x[1] * x[2]
end

function speelpenning2(y, x)
    y[] = speelpenning(x)
    return nothing
end

function test_gradient(x::Vector{Float64}, gradient::Vector{Float64})
    errg = 0.0
    y = speelpenning(x)
    for (i, v) in enumerate(x)
        errg += abs(dx[i]-y/v)
    end
    @test isapprox(errg, 0, atol=1e-15)
    return errg
end

n = 2
x = [i/(1.0+i) for i in 1:n]
y = speelpenning(x)
@show isapprox(y, 1.0)

dx = zeros(n)
seeds = zeros(n)
dual_x = ForwardDiff.Dual.(x, seeds)
dual_dx = ForwardDiff.Dual.(dx, seeds)
# Breaks with ForwardDiff.Dual numbers with a tangent. Output in error1 file
autodiff(speelpenning, Const, Duplicated(dual_x, dual_dx))
# THIRD CHECK
@show "Const regular"
@show dual_x
@show dual_dx

dx = zeros(n)
seeds = zeros(n)
dual_x = ForwardDiff.Dual.(x, seeds)
dual_dx = ForwardDiff.Dual.(dx, seeds)
# Breaks with ForwardDiff.Dual numbers with a tangent. Output in error1 file
autodiff(speelpenning, Active, Duplicated(dual_x, dual_dx))
# THIRD CHECK
@show "Active regular"
@show dual_x
@show dual_dx

y = Ref(ForwardDiff.Dual(0., 0.))
dy = Ref(ForwardDiff.Dual(0., 1.))
dx = zeros(n)
seeds = zeros(n)
dual_x = ForwardDiff.Dual.(x, seeds)
dual_dx = ForwardDiff.Dual.(dx, seeds)
autodiff(speelpenning2, Duplicated(y, dy), Duplicated(dual_x, dual_dx))
@show "Shadow 0, 1"
@show dual_x
@show dual_dx

y = Ref(ForwardDiff.Dual(0., 0.))
dy = Ref(ForwardDiff.Dual(1., 1.))
dx = zeros(n)
seeds = zeros(n)
dual_x = ForwardDiff.Dual.(x, seeds)
dual_dx = ForwardDiff.Dual.(dx, seeds)
autodiff(speelpenning2, Duplicated(y, dy), Duplicated(dual_x, dual_dx))
@show "Shadow 1, 1"
@show dual_x
@show dual_dx

y = Ref(ForwardDiff.Dual(0., 0.))
dy = Ref(ForwardDiff.Dual(0., 0.))
dx = zeros(n)
seeds = zeros(n)
dual_x = ForwardDiff.Dual.(x, seeds)
dual_dx = ForwardDiff.Dual.(dx, seeds)
autodiff(speelpenning2, Duplicated(y, dy), Duplicated(dual_x, dual_dx))
@show "Shadow 0, 0"
@show dual_x
@show dual_dx

# test_gradient(ForwardDiff.value.(dual_x), ForwardDiff.value.(dual_dx))

# println("$(test_gradient(x, dx)) error in gradient")

wsmoses avatar Jul 24 '22 22:07 wsmoses

wmoses@beast:~/git/Enzyme.jl ((HEAD detached at origin/main)) $ ./julia-1.7.2/bin/julia --project fwdcomb.jl 
isapprox(y, 1.0) = false
"Const regular" = "Const regular"
dual_x = ForwardDiff.Dual{Nothing, Float64, 1}[Dual{Nothing}(0.5,0.0), Dual{Nothing}(0.6666666666666666,0.0)]
dual_dx = ForwardDiff.Dual{Nothing, Float64, 1}[Dual{Nothing}(0.0,0.0), Dual{Nothing}(0.0,0.0)]
"Active regular" = "Active regular"
dual_x = ForwardDiff.Dual{Nothing, Float64, 1}[Dual{Nothing}(0.5,0.0), Dual{Nothing}(0.6666666666666666,0.0)]
dual_dx = ForwardDiff.Dual{Nothing, Float64, 1}[Dual{Nothing}(0.6666666666666666,0.0), Dual{Nothing}(0.5,0.0)]
"Shadow 0, 1" = "Shadow 0, 1"
dual_x = ForwardDiff.Dual{Nothing, Float64, 1}[Dual{Nothing}(0.5,0.0), Dual{Nothing}(0.6666666666666666,0.0)]
dual_dx = ForwardDiff.Dual{Nothing, Float64, 1}[Dual{Nothing}(0.0,0.6666666666666666), Dual{Nothing}(0.0,0.5)]
"Shadow 1, 1" = "Shadow 1, 1"
dual_x = ForwardDiff.Dual{Nothing, Float64, 1}[Dual{Nothing}(0.5,0.0), Dual{Nothing}(0.6666666666666666,0.0)]
dual_dx = ForwardDiff.Dual{Nothing, Float64, 1}[Dual{Nothing}(0.6666666666666666,0.6666666666666666), Dual{Nothing}(0.5,0.5)]
"Shadow 0, 0" = "Shadow 0, 0"
dual_x = ForwardDiff.Dual{Nothing, Float64, 1}[Dual{Nothing}(0.5,0.0), Dual{Nothing}(0.6666666666666666,0.0)]
dual_dx = ForwardDiff.Dual{Nothing, Float64, 1}[Dual{Nothing}(0.0,0.0), Dual{Nothing}(0.0,0.0)]

wsmoses avatar Jul 24 '22 22:07 wsmoses

@michel2323 could you add the one you care about as a test?

vchuravy avatar Jul 24 '22 22:07 vchuravy

@michel2323

wsmoses avatar Oct 06 '22 05:10 wsmoses

Here is actually the behavior that I am not sure is desired:

using ForwardDiff
using Enzyme
using Test

function speelpenning(x)
    reduce(*, x)
end

n = 16
x = [i/(1.0+i) for i in 1:n]
dx = zeros(n)
autodiff(speelpenning, Duplicated(x,dx))
dual_x = ForwardDiff.Dual.(x)
dual_dx = ForwardDiff.Dual.(zeros(n))
autodiff(speelpenning, Duplicated(dual_x, dual_dx))
@show dx
@show dual_dx

@test ForwardDiff.Dual.(dx) ≈ dual_dx # fails

This means that computing with dual numbers changes the derivative result if active returns are used in a first-order code and then dual numbers are used to calculate second-order derivatives. Doesn't this imply that no active returns should be used in a code where ForwardDiff is used? Is it possible to throw an error? When using Ref() it throws an error. Is it expected from the user to use Ref()?

michel2323 avatar Oct 10 '22 15:10 michel2323

I think what happens here is that Enzyme just doesn't automatically determine that the return type of Dual should be considered active and falls back to const. See:


julia> autodiff(speelpenning, Duplicated(dual_x, dual_dx))
()

julia> dual_dx
16-element Vector{ForwardDiff.Dual{Nothing, Float64, 0}}:
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
 Dual{Nothing}(0.0)
julia> autodiff(speelpenning, Active, Duplicated(dual_x, dual_dx))
()

julia> dual_dx
16-element Vector{ForwardDiff.Dual{Nothing, Float64, 0}}:
Dual{Nothing}(0.11764705882352941)
Dual{Nothing}(0.08823529411764706)
Dual{Nothing}(0.0784313725490196)
Dual{Nothing}(0.07352941176470588)
Dual{Nothing}(0.07058823529411765)
Dual{Nothing}(0.06862745098039216)
Dual{Nothing}(0.06722689075630253)
Dual{Nothing}(0.06617647058823531)
Dual{Nothing}(0.06535947712418302)
Dual{Nothing}(0.0647058823529412)
Dual{Nothing}(0.06417112299465243)
Dual{Nothing}(0.06372549019607844)
Dual{Nothing}(0.06334841628959277)
Dual{Nothing}(0.06302521008403364)
Dual{Nothing}(0.06274509803921571)
Dual{Nothing}(0.06250000000000003)

wsmoses avatar Oct 10 '22 18:10 wsmoses

The (reasonable) reason for the iffiness is for a regular floating point value its clear that is the value being differentiated wrt. For a tuple (like here), it's unclear if you want the derivative wrt the first value, wrt the second, the sum of derivatives wrt both, etc.

wsmoses avatar Oct 10 '22 18:10 wsmoses

You should be able to express your specific behavior by either returning the primal or dual value in the function being differentiated (e.g. reduce(*, x).value or reduce(*, x).partials) or using the Ref like above.

wsmoses avatar Oct 10 '22 18:10 wsmoses

Closing, @michel2323 please add a test from this if you have a chance.

wsmoses avatar Dec 15 '22 21:12 wsmoses