ForwardDiff.jl icon indicating copy to clipboard operation
ForwardDiff.jl copied to clipboard

`let` scope causes nested differentiation error

Open jrevels opened this issue 6 years ago • 15 comments

As reported by @tkoolen here. Note from my comment there - this does not seem to be a case of pertubation confusion, but rather an an inference problem of some sort (or possibly that ForwardDiff is incorrectly relying on inference behavior in some way).

The bug can be triggered with:

using ForwardDiff, Base.Test
@noinline f83a(z, x) = x[1]
z83a = ([(1, (2), [(3, (4, 5, [1, 2, (3, (4, 5), [5])]), (5))])])
let z = z83a
    g = x -> f83a(z, x)
    h = x -> g(x)
    @test ForwardDiff.hessian(h, [1.]) == zeros(1, 1)
end

If the let block is removed, then everything works normally. According to @tkoolen, this is coincidentally is fixed by #247, but it would be very valuable for us to figure out why.

ref https://github.com/tkoolen/RigidBodyDynamics.jl/issues/347 cc @goretkin

jrevels avatar Sep 25 '17 16:09 jrevels

Note that removing the let block changes the type of the Dual from

ForwardDiff.Dual{ForwardDiff.Tag{##32#34{##31#33{Array{Tuple{Int64,Int64,Array{Tuple{Int64,Tuple{Int64,Int64,Array{Any,1}},Int64},1}},1}}},ForwardDiff.Dual{ForwardDiff.Tag{Tuple{##32#34{##31#33{Array{Tuple{Int64,Int64,Array{Tuple{Int64,Tuple{Int64,Int64,Array{Any,1}},Int64},1}},1}}},ForwardDiff.#gradient},Float64},Float64,1}},ForwardDiff.Dual{ForwardDiff.Tag{Tuple{##32#34{##31#33{Array{Tuple{Int64,Int64,Array{Tuple{Int64,Tuple{Int64,Int64,Array{Any,1}},Int64},1}},1}}},ForwardDiff.#gradient},Float64},Float64,1},1}

to

ForwardDiff.Dual{ForwardDiff.Tag{##37#38,ForwardDiff.Dual{ForwardDiff.Tag{Tuple{##37#38,ForwardDiff.#gradient},Float64},Float64,1}},ForwardDiff.Dual{ForwardDiff.Tag{Tuple{##37#38,ForwardDiff.#gradient},Float64},Float64,1},1}

i.e., it is 'simpler'; that does indeed point in the direction of an inference problem. The fact that the tag parameter of a Dual can become so complicated in the presence of closures is a little worrisome.

tkoolen avatar Sep 25 '17 17:09 tkoolen

More evidence that this is a type inference problem (another part of tkoolen/RigidBodyDynamics.jl#347):

using ForwardDiff
@noinline function f267(z, x)
    @show Base.promote_op(Base.LinAlg.matprod, Float64, eltype(x))
    x[1]
end

z267 = ([(1, (2), [(3, (4, 5, [1, 2, (3, (4, 5), [5])]), (5))])])
let z = z267
    g = x -> f267(z, x)
    h = x -> g(x)
    @test ForwardDiff.hessian(h, [1.]) == zeros(1, 1)
end

Even with #247, this prints

Base.promote_op(Base.LinAlg.matprod, Float64, eltype(x)) = Any

tkoolen avatar Sep 25 '17 17:09 tkoolen

FWIW, I also encountered some weird generated function/inference heisenbugs as part of writing #247, at this part: https://github.com/simonbyrne/ForwardDiff.jl/blob/9684bac1bc21461f6030009a369246b4e7a90d5a/test/HessianTest.jl#L60-L66

It would generate a Tag where the function was of type T, presumably as part of the inference passes, which would stick around and cause conflicts.

The problem went away when I switched away from using the counter as a type parameter.

simonbyrne avatar Sep 25 '17 17:09 simonbyrne

With https://github.com/JuliaDiff/ForwardDiff.jl/pull/247 @tkoolen's test pass; is that because the generated tags are simpler, thereby hiding the type inference bug?

slightly off-topic: Is there a recommended (use-at-risk-of-perturbation-confusion) fallback that lets me manually specify the Tag parameter?

goretkin avatar Sep 25 '17 20:09 goretkin

With #247 @tkoolen's test pass; is that because the generated tags are simpler, thereby hiding the type inference bug?

Possibly.

Is there a recommended (use-at-risk-of-perturbation-confusion) fallback that lets me manually specify the Tag parameter?

You can opt of the tagging system (at the risk of pertubation confusion) by passing nothing as the function argument to the <:AbstractConfig constructors (see docs here).

jrevels avatar Sep 25 '17 21:09 jrevels

Running into this issue again.

I was wondering: could the F parameter of Tag just be replaced with a hash of the function type? The hash could maybe be computed in the generator of an @generated Tag constructor, so that there is no runtime overhead.

tkoolen avatar Mar 19 '18 21:03 tkoolen

Do you have a reproducible case? Both the above pass on my machine now.

simonbyrne avatar Mar 19 '18 21:03 simonbyrne

Yeah. I'll try to reduce it to something that doesn't require half the Julia ecosystem :-)

tkoolen avatar Mar 19 '18 21:03 tkoolen

Somewhat reduced example:

import ForwardDiff
import RigidBodyDynamics

struct Revolute{T} end
struct Fixed{T} end
struct Joint{T, JT} end
struct TypeSortedCollection{D, N} end
struct Dynamics{M, JointCollection, C, P} end
struct UJacobianWrapper{fType,tType,P} end

setparams! = (state, p) -> nothing
zero_control!(τ::AbstractVector, t, state) = nothing

TTag = ForwardDiff.Tag{UJacobianWrapper{Dynamics{Float64,TypeSortedCollection{Tuple{Array{Joint{Float64,Fixed{Float64}},1},Array{Joint{Float64,Revolute{Float64}},1}},2},typeof(zero_control!),typeof(setparams!)},Float64,Void},Float64}
T = ForwardDiff.Dual{TTag,Float64,4}
mechanism = RigidBodyDynamics.rand_tree_mechanism(Float64, [RigidBodyDynamics.Revolute{Float64} for i = 1 : 30]...)
RigidBodyDynamics.DynamicsResult{T}(mechanism)

with https://github.com/JuliaRobotics/RigidBodyDynamics.jl/commit/3ffac55766cf504d57abd009ec8570f193933cf5 and ForwardDiff 0.7.3.

This arose from https://github.com/JuliaRobotics/RigidBodySim.jl/pull/54, where a stiff integrator from DifferentialEquations.jl needs to take gradients of the dynamics.

There's probably something I can do to help out inference in this particular case, but I feel like this issue will just keep cropping up. Having such a crazy Tag type can't be good for compilation times either, I would assume.

tkoolen avatar Mar 19 '18 22:03 tkoolen

Having such a crazy Tag type can't be good for compilation times either, I would assume.

Yeah... it can get quite bad. But you can set the solver to use autodiff=false to probably get around this.

ChrisRackauckas avatar Mar 19 '18 22:03 ChrisRackauckas

Having such a crazy Tag type can't be good for compilation times either, I would assume.

Yeah, that's why we calculated a hash there instead of dropped the type in directly, but it looks like that got changed in #247...I'm not sure what the reasoning was there - maybe to make it easier to implement the extra tag checking stuff that went into that PR? Anyway, AFAICT there's nothing preventing us from going back to hashing the function type associated with the tag.

jrevels avatar Mar 20 '18 04:03 jrevels

I don't see why it should make much difference to compile times, but I will note that switching away from the hashing did fix the original problem in this issue.

simonbyrne avatar Mar 20 '18 16:03 simonbyrne

I don't see why it should make much difference to compile times

Yeah, this probably merits actual benchmarking.

switching away from the hashing did fix the original problem in this issue.

D'oh, I should've scrolled up and reread this thread before replying. Thanks for the reminder.

I wonder if switching both the Tag's type parameters to objectids would hit the same issue that was fixed by removing the symbol hashing. It'd certainly simplify the Tag leaftypes. Other than working around the bug, is there any reason why we should keep the full function/value type around?

jrevels avatar Mar 20 '18 17:03 jrevels

If we could get nicer printing of tags (for which I started writing https://github.com/simonbyrne/ShowTree.jl?), then it might make debugging easier as you know where the Duals originated.

simonbyrne avatar Mar 20 '18 17:03 simonbyrne

I tried to help the compiler in https://github.com/JuliaRobotics/RigidBodyDynamics.jl/pull/414, which worked (the somewhat-minimal example above now works ~~and infers correctly~~), but if I then remove the Tag constructor overloads in RigidBodySim and run the RigidBodySim tests, I get:

signal (4): Illegal instruction
while loading /home/twan/code/RigidBodyDynamics/v0.6/RigidBodySim/test/runtests.jl, in expression starting on line 244
Type at /home/twan/code/RigidBodyDynamics/v0.6/RigidBodyDynamics/src/dynamics_result.jl:45
getindex at /home/twan/code/RigidBodyDynamics/v0.6/RigidBodyDynamics/src/caches.jl:13
Dynamics at /home/twan/code/RigidBodyDynamics/v0.6/RigidBodySim/src/core.jl:77
TimeGradientWrapper at /home/twan/code/RigidBodyDynamics/v0.6/DiffEqDiffTools/src/function_wrappers.jl:7
unknown function (ip: 0x7fd9046f0249)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
derivative! at /home/twan/code/RigidBodyDynamics/v0.6/ForwardDiff/src/derivative.jl:66 [inlined]
derivative! at /home/twan/code/RigidBodyDynamics/v0.6/ForwardDiff/src/derivative.jl:63
unknown function (ip: 0x7fd9046f009a)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
derivative! at /home/twan/code/RigidBodyDynamics/v0.6/OrdinaryDiffEq/src/derivative_wrappers.jl:3
unknown function (ip: 0x7fd9046efdea)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1424 [inlined]
jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:51
perform_step! at /home/twan/code/RigidBodyDynamics/v0.6/OrdinaryDiffEq/src/perform_step/rosenbrock_perform_step.jl:1108
unknown function (ip: 0x7fd9046ef6e0)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
solve! at /home/twan/code/RigidBodyDynamics/v0.6/OrdinaryDiffEq/src/solve.jl:335
unknown function (ip: 0x7fd9046e50b2)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
#solve#1370 at /home/twan/code/RigidBodyDynamics/v0.6/OrdinaryDiffEq/src/solve.jl:7
unknown function (ip: 0x7fd9046d5be0)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1424 [inlined]
jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:51
#solve at ./<missing>:0
#solve at ./<missing>:0
unknown function (ip: 0x7fd9046d57ad)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
macro expansion at /home/twan/code/RigidBodyDynamics/v0.6/RigidBodySim/test/runtests.jl:255 [inlined]
macro expansion at ./test.jl:860 [inlined]
anonymous at ./<missing> (unknown line)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:589
jl_eval_module_expr at /buildworker/worker/package_linux64/build/src/toplevel.c:205
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:480
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:873
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:616
include_from_node1 at ./loading.jl:576
unknown function (ip: 0x7fd9344c25fb)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
include at ./sysimg.jl:14
unknown function (ip: 0x7fd9343596ab)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
process_options at ./client.jl:305
_start at ./client.jl:371
unknown function (ip: 0x7fd9344d0e58)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
jl_apply at /buildworker/worker/package_linux64/build/ui/../src/julia.h:1424 [inlined]
true_main at /buildworker/worker/package_linux64/build/ui/repl.c:127
main at /buildworker/worker/package_linux64/build/ui/repl.c:264
__libc_start_main at /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:291
unknown function (ip: 0x4016bc)
Allocations: 116369338 (Pool: 116353506; Big: 15832); GC: 654

Although that probably warrants a Julia issue (if it hasn't been fixed yet on master), I think this is more evidence that having complicated tag types somehow pushes the compiler to its limits.

Edit: the "infers correctly" part is not true. Edit: this is with https://github.com/JuliaRobotics/RigidBodyDynamics.jl/pull/414/commits/3188f26944c96aea14c67e69b51f0a7ebdec566e.

tkoolen avatar Mar 20 '18 21:03 tkoolen