NeuralPDE.jl icon indicating copy to clipboard operation
NeuralPDE.jl copied to clipboard

[WIP] ForwardDiff AD

Open KirillZubov opened this issue 3 years ago • 22 comments

KirillZubov avatar Dec 06 '21 18:12 KirillZubov

What's the speed difference? High order?

ChrisRackauckas avatar Dec 06 '21 19:12 ChrisRackauckas

Not very clear with the 3rd order ODE about speed. but the accuracy of the solution is clearly highe. The numerical method is fast because it is not accurate. That is, with approximately the same execution speed AD and numeric for 3rd order, the accuracy with AD is higher

AD BFGS 7.006 s (8964020 allocations: 13.38 GiB) ADAM 9.114 s (12620139 allocations: 18.74 GiB)

numeric BFGS 5.260 s (6739726 allocations: 10.56 GiB) ADAM 9.743 s (11914038 allocations: 18.64 GiB)

KirillZubov avatar Dec 06 '21 20:12 KirillZubov

Cool, that's what I expected. What about on KS?

ChrisRackauckas avatar Dec 06 '21 21:12 ChrisRackauckas

There require 2 dim. for KS. I suppose that it will work well with AD. I will write a hardcode 2 dim version tomorrow for presentation but actually it parser issue. There need to pass variables to the body of the derivative function.

function inner_parser_derivative(f,vars,dervars_num, order)
    der_vars = vars[dervars_num[order]]
    if order > 1
        expr =  :(ForwardDiff.derivative($der_vars->$(inner_parser_derivative(f,vars,dervars_num, (order-1))),$der_vars))
        return expr
    else
        expr = :(ForwardDiff.derivative($der_vars->$f($(vars...),θ),$der_vars)[1])
        return expr
    end
end

function parser_derivative(f,vars,dervars_num)
     order = size(dervars_num)[1]
    :(($(vars...),θ) -> $(inner_parser_derivative(f,vars,dervars_num, order)))
end
function parser_derivative_broadcast(vars,dervars_num,expr_der_func)
     # der_func =@RuntimeGeneratedFunction(expr_der_func)
    :(($(vars...),θ) -> $(der_func.($(vars...))))
end

KirillZubov avatar Dec 06 '21 21:12 KirillZubov

It's a scaling thing. AD should scale as the power in the polynomial while numerical won't. AD should have worse scaling for higher order AD (contrary to what most people would think! This is a case where AD is strictly worse than numerical! I mention it in the paper). This would be a very good thing to note. It follows from the discussion in Griewank's AD book.

ChrisRackauckas avatar Dec 06 '21 21:12 ChrisRackauckas

So far, AD is significantly slower than the numerical derivative.

Poisson's eq, 2D maxiters=100 AD BFGS 39.180 s (33577736 allocations: 68.19 GiB) ADAM 14.158 s (12608940 allocations: 24.47 GiB) numeric BFGS 945.420 ms (1723964 allocations: 692.32 MiB) ADAM 334.701 ms (629516 allocations: 240.37 MiB)

KirillZubov avatar Dec 07 '21 16:12 KirillZubov

I think it is eventual to speed up significantly AD runtime, for example, https://juliadiff.org/ForwardDiff.jl/v0.10.2/user/advanced.html#Configuring-Chunk-Size-1

KirillZubov avatar Dec 07 '21 16:12 KirillZubov

Just set the chunk size to 1 to see if there's a type-instability from it.

ChrisRackauckas avatar Dec 07 '21 16:12 ChrisRackauckas

Kuramoto–Sivashinsky equation looks good with ForwardDiff AD but still too long BFGS, 300 iterations KS AD

KirillZubov avatar Dec 07 '21 19:12 KirillZubov

I have problem with RuntimeGeneratedFunctions.

using RuntimeGeneratedFunctions, DiffEqFlux, ForwardDiff
phi = FastChain(FastDense(1,12,Flux.tanh),FastDense(12,12,Flux.tanh),FastDense(12,1))
indvars = [:x]
undv = [1]
derivative_expr = parser_derivative(phi,indvars,undv)
derivative_expr =:((θ_, x)->begin
          #= none:3 =#
          ForwardDiff.derivative((x->begin
                      #= none:8 =#
                      (((x, θ_)->begin
                              #= none:3 =#
                              ((FastChain{Tuple{FastDense{typeof(tanh), DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}, FastDense{typeof(tanh), DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}, FastDense{typeof(identity), DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}}}((FastDense{typeof(tanh), DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}(12, 1, tanh, DiffEqFlux.var"#initial_params#90"{Vector{Float32}}(Float32[0.34942162, -0.6759936, 0.31213823, -0.6459028, -0.27329746, -0.49197906, 0.6756018, 0.5688718, -0.36240864, -0.6780105  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), true), FastDense{typeof(tanh), DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}(12, 12, tanh, DiffEqFlux.var"#initial_params#90"{Vector{Float32}}(Float32[0.19952452, 0.003932476, -0.16829717, -0.30636954, 0.11300719, 0.09094751, 0.29692435, 0.15857887, -0.29655933, -0.0685693  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), true), FastDense{typeof(identity), DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}(1, 12, identity, DiffEqFlux.var"#initial_params#90"{Vector{Float32}}(Float32[-0.5841597, -0.3415628, -0.17458333, -0.11816714, -0.3745709, 0.3589071, -0.2814319, -0.41690788, 0.6444232, 0.0038750547, 0.21813866, 0.6468031, 0.0]), true))))(vcat(x), θ_))[1]
                          end))(x, θ_)
                  end), x)
      end)

RuntimeGeneratedFunctions.init(@__MODULE__)
der2 = @RuntimeGeneratedFunction(derivative_expr)
julia> der2.(Ref(initθ),rand(1,10))
ERROR: The function body AST defined by this @generated function is not pure. This likely means it contains a closure or comprehension.
Stacktrace:
 [1] RuntimeGeneratedFunction
   @ ~/.julia/packages/RuntimeGeneratedFunctions/KrkGo/src/RuntimeGeneratedFunctions.jl:117 [inlined]
 [2] _broadcast_getindex_evalf
   @ ./broadcast.jl:648 [inlined]
 [3] _broadcast_getindex
   @ ./broadcast.jl:621 [inlined]
 [4] getindex
   @ ./broadcast.jl:575 [inlined]
 [5] copy
   @ ./broadcast.jl:922 [inlined]
 [6] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, RuntimeGeneratedFunction{(:θ_, :x), var"#_RGF_ModTag", var"#_RGF_ModTag", (0x44ae534c, 0x7029fbab, 0x55027912, 0x1acfec15, 0x18543f01)}, Tuple{Base.RefValue{Vector{Float64}}, Matrix{Float64}}})
   @ Base.Broadcast ./broadcast.jl:883
 [7] top-level scope
   @ none:1

KirillZubov avatar Dec 21 '21 18:12 KirillZubov

I got that same error message when trying to implement the Integral with Interval(x, Inf).

    if a isa Num 
        a = NeuralPDE.build_symbolic_loss_function(nothing, indvars,depvars,
                                                        dict_indvars,dict_depvars,
                                                        dict_depvar_input, phi, derivative_,
                                                        nothing, chain, initθ, strategy,
                                                        integrand = a, integrating_depvars=integrating_depvars,
                                                        param_estim =false, default_p = nothing)
     ...

killah-t-cell avatar Dec 21 '21 23:12 killah-t-cell

I still can't figure out how to use the @RuntimeGeneratedFunction, here one way or another it needs to call an anonymous function inside the function body. Which leads to an error with RuntimeGeneratedFunction @ChrisRackauckas do you have ideas?

using ForwardDiff, Flux, DiffEqFlux, RuntimeGeneratedFunction
RuntimeGeneratedFunctions.init(@__MODULE__)

phi = FastChain(FastDense(2,12,Flux.tanh),FastDense(12,12,Flux.tanh),FastDense(12,1))
initθ = Float64.(DiffEqFlux.initial_params(phi))
u = (x,y,θ_) -> phi(vcat(x,y),θ_)[1]

z = function (x,y, θ)
    x -> u(x,y, θ)
end
x=1
y=1
z(x,y, initθ)
ForwardDiff.derivative(z(x,y, initθ),x)
func_der = (x,y, θ) -> ForwardDiff.derivative(x -> u(x,y, θ),x)
func_der(x,y, initθ)

zzz = :(function (x,y, θ)
    x -> u(x,y, θ)
end)
zz_ = @RuntimeGeneratedFunction(zzz)

julia> zz_(1,1,initθ)(1)
ERROR: The function body AST defined by this @generated function is not pure. This likely means it contains a closure or comprehension.
Stacktrace:
 [1] (::RuntimeGeneratedFunction{(:x, :y, :θ), var"#_RGF_ModTag", var"#_RGF_ModTag", (0xc616339c, 0xb34a9fc4, 0xa68d2e38, 0x274a155f, 0x6f509da5)})(::Int64, ::Int64, ::Vector{Float64})
   @ RuntimeGeneratedFunctions ~/.julia/packages/RuntimeGeneratedFunctions/KrkGo/src/RuntimeGeneratedFunctions.jl:117
 [2] top-level scope
   @ none:1

zz_ = @eval $zzz
zz_(1,1,initθ)(1)
func_der = (x,y, θ) -> ForwardDiff.derivative(zz_(x,y,initθ),x)
func_der(1,1,initθ)

someexpr = :((x,y,θ) -> ForwardDiff.derivative($zz_(x,y,θ),x))
zd = @RuntimeGeneratedFunction(someexpr)
zd(1,2,initθ)


someexpr2 = :((x,y,θ) -> ForwardDiff.derivative(x -> u(x,y, θ),x))
zd2= @RuntimeGeneratedFunction(someexpr2)
zd2(1,2,initθ)
ERROR: The function body AST defined by this @generated function is not pure. This likely means it contains a closure or comprehension.
Stacktrace:
 [1] (::RuntimeGeneratedFunction{(:x, :y, :θ), var"#_RGF_ModTag", var"#_RGF_ModTag", (0x00b2cc5f, 0xddfe1493, 0x62e6a275, 0x5ca8b534, 0x57f11384)})(::Int64, ::Int64, ::Vector{Float64})
   @ RuntimeGeneratedFunctions ~/.julia/packages/RuntimeGeneratedFunctions/KrkGo/src/RuntimeGeneratedFunctions.jl:117
 [2] top-level scope
   @ none:1

KirillZubov avatar Dec 23 '21 14:12 KirillZubov

Derivative benchmarks.

chain = FastChain(FastDense(2,16,Flux.σ),FastDense(16,16,Flux.σ),FastDense(16,1))
initθ = Float64.(DiffEqFlux.initial_params(chain))
eltypeθ = eltype(initθ)
parameterless_type_θ = DiffEqBase.parameterless_type(initθ)
phi = NeuralPDE.get_phi(chain,parameterless_type_θ)
u_ = (cord, θ, phi)->phi(cord, θ)
_epsilon = one(eltype(initθ)) / (2*cbrt(eps(eltype(initθ))))
ε  = cbrt(eps(eltype(initθ)))

function get_ε(dim, der_num,eltypeθ)
    epsilon = cbrt(eps(eltypeθ))
    ε = zeros(eltypeθ, dim)
    ε[der_num] = epsilon
    ε
end

eps_x = get_ε(2, 1,Float64)
eps_y = get_ε(2, 2,Float64)
r =rand(1,10)

#First order derivative

#ad
dphi_xa = eval(NeuralPDE.parser_derivative(phi,[:x,:y],[1]))
@benchmark dphi_xa.(Ref(initθ),r,r) setup=(r=rand(1,100))
BenchmarkTools.Trial: 7166 samples with 1 evaluation.
 Range (min … max):  288.637 μs … 189.844 ms  ┊ GC (min … max):  0.00% … 99.73%
 Time  (median):     383.253 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   707.494 μs ±   6.754 ms  ┊ GC (mean ± σ):  35.62% ±  3.72%

         █▅
  ▁▂▄▆▇▆▇██▅▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  289 μs           Histogram: frequency by time          955 μs <

 Memory estimate: 1.08 MiB, allocs estimate: 3214.

#numeric
@benchmark derivative(phi,u_,r,[eps_x],1,initθ) setup=(r=rand(2,100))

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   55.242 μs … 108.085 ms  ┊ GC (min … max):  0.00% … 99.85%
 Time  (median):      72.666 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   101.174 μs ±   1.081 ms  ┊ GC (mean ± σ):  10.67% ±  1.00%

  ▃▂▅▆▄█▇▃▂▁  ▁▁▁▂▄▅▅▄▃▃▂▁▁▁▁▁▂▁                                ▂
  █████████████████████████████████████▇█▇▇▇▇▇▆▆▆▇▆▇▆▅▆▆▅▆▅▆▅▆▅ █
  55.2 μs       Histogram: log(frequency) by time        225 μs <

 Memory estimate: 126.77 KiB, allocs estimate: 60.


 ## aa second order ad
 dphi_xxaa = eval(NeuralPDE.parser_derivative(phi,[:x,:y],[1,1]))

 ## na second order numeric + ad
 dphi_xn =(x,θ) -> (phi(x .+ eps_x, θ) .- phi(x .- eps_x, θ)) .* _epsilon
  dphi_xn([1,2],initθ)
 dphi_xxna = eval(NeuralPDE.parser_derivative(dphi_xn,[:x,:y],[1]))

 ## an second order  ad + numeric
 dphi_xa = eval(NeuralPDE.parser_derivative(phi,[:x,:y],[1]))
 dphi_xxan =(θ,x,y) -> (dphi_xa.(Ref(θ),x .+ε,y) .- dphi_xa.(Ref(θ),x .-ε,y)) .* _epsilon

 ## nn, second order numeric
 derivative = NeuralPDE.get_numeric_derivative()
 dphi_xxnn = (θ,cord) -> derivative(phi,u_,cord,[eps_x,eps_x],2,θ)

 phi_ = (p) -> phi(p, initθ)[1]
 hess_phi = Zygote.hessian(phi_,[1,2])

 hess_phi[1]
 dphi_xxaa(initθ,1,2)
 dphi_xxna(initθ,1,2)
 dphi_xxan(initθ,1,2)
 dphi_xxnn(initθ,[1,2])

 @test isapprox(hess_phi[1], dphi_xxaa(initθ,1,2), atol=1e-12)
 @test isapprox(hess_phi[1], dphi_xxna(initθ,1,2), atol=1e-12)
 @test isapprox(hess_phi[1], dphi_xxan(initθ,1,2), atol=1e-12)
 @test isapprox(hess_phi[1], dphi_xxnn(initθ,[1,2])[1], atol=1e-6)


#Second order derivative
#numeric
@benchmark dphi_xxnn(initθ,r) setup=(r=rand(2,100))

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  116.201 μs … 118.110 ms  ┊ GC (min … max):  0.00% … 99.72%
 Time  (median):     151.786 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   219.325 μs ±   1.943 ms  ┊ GC (mean ± σ):  15.30% ±  1.73%

    ▄▅▅██▅▄▃▂▂▃▃▂▁▁▂▄▄▅▅▅▄▃▃▃▂▂▂▂▂▁▁▁▁▁ ▁                       ▂
  ███████████████████████████████████████████▇███▇▆▇▆▇▇▆▇▆▇▆▆▆▇ █
  116 μs        Histogram: log(frequency) by time        400 μs <

 Memory estimate: 257.11 KiB, allocs estimate: 126.

#ad
@benchmark dphi_xxaa.(Ref(initθ),r,r) setup=(r=rand(1,100))
BenchmarkTools.Trial: 6281 samples with 1 evaluation.
 Range (min … max):  392.570 μs … 175.002 ms  ┊ GC (min … max):  0.00% … 99.54%
 Time  (median):     492.478 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   780.928 μs ±   6.247 ms  ┊ GC (mean ± σ):  30.24% ±  3.77%

          █▇▁
  ▁▃▅▇▇▆▄▃███▆▄▄▃▄▃▃▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  393 μs           Histogram: frequency by time          995 μs <

 Memory estimate: 1.20 MiB, allocs estimate: 3314.

#ad + numeric
@benchmark dphi_xxan.(Ref(initθ),r,r) setup=(r=rand(1,100))
BenchmarkTools.Trial: 2375 samples with 1 evaluation.
 Range (min … max):  1.210 ms … 185.799 ms  ┊ GC (min … max):  0.00% … 98.84%
 Time  (median):     1.525 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   2.085 ms ±   8.590 ms  ┊ GC (mean ± σ):  20.59% ±  4.95%

      ▁   ▃██▂▁
  ▂▃▃▄█▇▇██████▇▅▅▅▄▄▄▄▃▃▃▃▃▃▃▄▃▄▃▄▃▃▃▃▃▃▃▃▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂ ▃
  1.21 ms         Histogram: frequency by time        2.89 ms <

 Memory estimate: 2.20 MiB, allocs estimate: 8214.

#numeric + ad
@benchmark dphi_xxna.(Ref(initθ),r,r) setup=(r=rand(1,100))
BenchmarkTools.Trial: 2835 samples with 1 evaluation.
 Range (min … max):  870.898 μs … 187.949 ms  ┊ GC (min … max):  0.00% … 99.18%
 Time  (median):       1.152 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.775 ms ±   9.307 ms  ┊ GC (mean ± σ):  27.79% ±  5.26%

          ▃█▅
  ▁▂▃▅█▇▆█████▆▅▄▃▃▄▃▄▃▃▄▃▃▄▃▃▃▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁ ▃
  871 μs           Histogram: frequency by time         2.32 ms <

 Memory estimate: 2.18 MiB, allocs estimate: 6814.

#backprop
#ForwardDiff
#numeric
r=rand(2,100)
@benchmark ForwardDiff.gradient(θ->sum(dphi_xxnn(θ,r)),inθ) setup=(inθ=initθ)
BenchmarkTools.Trial: 66 samples with 1 evaluation.
 Range (min … max):  58.578 ms … 155.051 ms  ┊ GC (min … max):  0.00% … 54.71%
 Time  (median):     67.640 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   75.757 ms ±  24.971 ms  ┊ GC (mean ± σ):  11.30% ± 17.06%

       █▁
  ▄▄▅█████▃▆▁▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▃▃▁▃ ▁
  58.6 ms         Histogram: frequency by time          153 ms <

 Memory estimate: 88.87 MiB, allocs estimate: 6559.

r=rand(1,100)
#ad
@benchmark ForwardDiff.gradient(θ->sum(dphi_xxaa.(Ref(θ),r,r)),inθ) setup=(inθ=initθ)
BenchmarkTools.Trial: 31 samples with 1 evaluation.
 Range (min … max):   74.627 ms … 304.291 ms  ┊ GC (min … max):  0.00% … 48.78%
 Time  (median):     115.402 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   162.370 ms ±  82.649 ms  ┊ GC (mean ± σ):  35.01% ± 26.95%

  █▄▁  ▁  ▁                                      █
  ███▁▁█▁▆█▆▆▁▁▁▁▆▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▆▁▁▁▁▆▆▁█▆▆▁▁▆▁▆▁▁▁▁▁▆ ▁
  74.6 ms          Histogram: frequency by time          304 ms <

 Memory estimate: 368.69 MiB, allocs estimate: 104840.

#ad + numeric
@benchmark ForwardDiff.gradient(θ->sum(dphi_xxan.(Ref(θ),r,r)),inθ) setup=(inθ=initθ)
BenchmarkTools.Trial: 18 samples with 1 evaluation.
 Range (min … max):  148.058 ms … 388.769 ms  ┊ GC (min … max):  0.00% … 43.44%
 Time  (median):     315.969 ms               ┊ GC (median):    44.01%
 Time  (mean ± σ):   291.624 ms ±  77.209 ms  ┊ GC (mean ± σ):  39.03% ± 19.03%

    ▃                                       █  ▃▃
  ▇▁█▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▇▇█▁▁██▇▁▁▇▁▁▇▁▁▁▁▁▁▇ ▁
  148 ms           Histogram: frequency by time          389 ms <

 Memory estimate: 660.88 MiB, allocs estimate: 264340.

#numeric + ad
@benchmark ForwardDiff.gradient(θ->sum(dphi_xxna.(Ref(θ),r,r)),inθ) setup=(inθ=initθ)
BenchmarkTools.Trial: 18 samples with 1 evaluation.
 Range (min … max):  138.084 ms … 378.683 ms  ┊ GC (min … max):  0.00% … 42.60%
 Time  (median):     321.987 ms               ┊ GC (median):    44.54%
 Time  (mean ± σ):   295.129 ms ±  81.405 ms  ┊ GC (mean ± σ):  38.62% ± 18.68%

  ▁                                            ▁█    ▁
  █▁▁▁▆▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁▆▆█▆▁▆▁▆▁▁▁▆ ▁
  138 ms           Histogram: frequency by time          379 ms <

 Memory estimate: 658.75 MiB, allocs estimate: 215040.

#Zygote
r=rand(2,100)
#numeric
@benchmark Zygote.gradient(θ->sum(dphi_xxnn(θ,r)),inθ) setup=(inθ=initθ)
BenchmarkTools.Trial: 5587 samples with 1 evaluation.
 Range (min … max):  643.525 μs … 129.094 ms  ┊ GC (min … max):  0.00% … 98.66%
 Time  (median):     727.925 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   890.099 μs ±   3.607 ms  ┊ GC (mean ± σ):  12.06% ±  2.96%

       █▃
  ▁▂▂▃▅██▆▄▃▃▃▄▅▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  644 μs           Histogram: frequency by time         1.26 ms <

 Memory estimate: 880.38 KiB, allocs estimate: 1678.


r=rand(1,100)
#ad
@benchmark Zygote.gradient(θ->sum(dphi_xxaa.(Ref(θ),r,r)),inθ) setup=(inθ=initθ)
BenchmarkTools.Trial: 4842 samples with 1 evaluation.
 Range (min … max):  650.008 μs … 163.139 ms  ┊ GC (min … max):  0.00% … 99.42%
 Time  (median):     740.486 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.029 ms ±   6.268 ms  ┊ GC (mean ± σ):  24.73% ±  4.04%

        ▄██▂
  ▃▄▅▆▇██████▆▅▅▄▅▆▅▄▄▃▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂ ▃
  650 μs           Histogram: frequency by time         1.28 ms <

 Memory estimate: 1.52 MiB, allocs estimate: 5322.

#ad + numeric
@benchmark Zygote.gradient(θ->sum(dphi_xxan.(Ref(θ),r,r)),inθ) setup=(inθ=initθ)
BenchmarkTools.Trial: 1439 samples with 1 evaluation.
 Range (min … max):  2.571 ms … 160.798 ms  ┊ GC (min … max):  0.00% … 97.86%
 Time  (median):     2.876 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   3.469 ms ±   9.076 ms  ┊ GC (mean ± σ):  15.41% ±  5.77%

      ▇▇█▇▅▂▅▄▁▂▂▄▃▃▁
  ▂▃▆████████████████▇▆▄▄▃▄▄▃▃▄▃▃▃▃▃▂▂▂▂▁▂▂▃▁▂▃▂▂▂▂▁▂▁▂▂▂▂▂▂▂ ▄
  2.57 ms         Histogram: frequency by time        4.16 ms <

 Memory estimate: 3.04 MiB, allocs estimate: 21631.

#numeric + ad
@benchmark Zygote.gradient(θ->sum(dphi_xxna.(Ref(θ),r,r)),inθ) setup=(inθ=initθ)
BenchmarkTools.Trial: 2761 samples with 1 evaluation.
 Range (min … max):  991.767 μs … 190.764 ms  ┊ GC (min … max):  0.00% … 98.95%
 Time  (median):       1.248 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.804 ms ±   9.277 ms  ┊ GC (mean ± σ):  27.59% ±  5.33%

       ▇▅▇▄▃ ▁▂▆█▅
  ▂▂▃▅▄█████████████▆▅▄▃▃▄▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▂▁▁▁ ▃
  992 μs           Histogram: frequency by time         2.14 ms <

 Memory estimate: 2.51 MiB, allocs estimate: 8822.

KirillZubov avatar Jan 03 '22 17:01 KirillZubov

Poisson equation benchmarks

@parameters x y
@variables u(..)
Dxx = Differential(x)^2
Dyy = Differential(y)^2
eq  = Dxx(u(x,y)) + Dyy(u(x,y)) ~ -sin(pi*x)*sin(pi*y)
bcs = [u(0,y) ~ 0.0, u(1,y) ~ -sin(pi*1)*sin(pi*y),
       u(x,0) ~ 0.0, u(x,1) ~ -sin(pi*x)*sin(pi*1)]
domains = [x ∈ Interval(0.0,1.0),
           y ∈ Interval(0.0,1.0)]

fastchain = FastChain(FastDense(2,12,Flux.σ),FastDense(12,12,Flux.σ),FastDense(12,1))

initθ = Float64.(DiffEqFlux.initial_params(fastchain))
grid_strategy = NeuralPDE.GridTraining(0.1)
discretization = NeuralPDE.PhysicsInformedNN(fastchain,
                                             grid_strategy;
                                             init_params = initθ,
                                             AD =false)

@named pde_system = PDESystem(eq,bcs,domains,[x,y],[u(x, y)])
prob_numeric = NeuralPDE.discretize(pde_system,discretization)

discretization = NeuralPDE.PhysicsInformedNN(fastchain,
                                             grid_strategy;
                                             init_params = initθ,
                                             AD =true)
prob_AD = NeuralPDE.discretize(pde_system,discretization)

@btime res = GalacticOptim.solve(prob_numeric, ADAM(0.1); maxiters=500);
  5.896 s (3143007 allocations: 1.17 GiB)
@btime res = GalacticOptim.solve(prob_numeric, BFGS(); maxiters=500);
 25.525 s (15823593 allocations: 6.05 GiB)
@btime res = GalacticOptim.solve(prob_AD, ADAM(0.1); maxiters=500);
 60.659 s (55159507 allocations: 121.93 GiB)
@btime res = GalacticOptim.solve(prob_AD, BFGS(); maxiters=500);
 199.556 s (142649297 allocations: 315.47 GiB)

KirillZubov avatar Jan 04 '22 13:01 KirillZubov

dependency runtime of the size of the neural network

num = ...
fastchain = FastChain(FastDense(2,num,Flux.σ),FastDense(num,num,Flux.σ),FastDense(num,1))

initθ = Float64.(DiffEqFlux.initial_params(fastchain))
grid_strategy = NeuralPDE.GridTraining(0.1)
quasirandom_strategy = NeuralPDE.QuasiRandomTraining(50;
                                                     bcs_points= 10,
                                                     sampling_alg = LatinHypercubeSample())
discretization = NeuralPDE.PhysicsInformedNN(fastchain,
                                             grid_strategy;
                                             init_params = initθ,
                                             AD =false)

@named pde_system = PDESystem(eq,bcs,domains,[x,y],[u(x, y)])
prob_numeric = NeuralPDE.discretize(pde_system,discretization)

discretization = NeuralPDE.PhysicsInformedNN(fastchain,
                                             grid_strategy;
                                             init_params = initθ,
                                             AD =true)
prob_AD = NeuralPDE.discretize(pde_system,discretization)

#num =4
@btime GalacticOptim.solve(prob_numeric, ADAM(0.1); maxiters=10);
  14.994 ms (62886 allocations: 15.96 MiB)
@btime  GalacticOptim.solve(prob_numeric, BFGS(); maxiters=10);
  48.262 ms (169923 allocations: 43.27 MiB)
@btime GalacticOptim.solve(prob_AD, ADAM(0.1); maxiters=10);
  48.977 ms (282046 allocations: 148.32 MiB)
@btime res_ad = GalacticOptim.solve(prob_AD, BFGS(); maxiters=10);
  140.144 ms (761654 allocations: 400.64 MiB)

#num =8
@btime GalacticOptim.solve(prob_numeric, ADAM(0.1); maxiters=10);
  19.958 ms (62885 allocations: 19.77 MiB)
@btime  GalacticOptim.solve(prob_numeric, BFGS(); maxiters=10);
  74.776 ms (232741 allocations: 74.36 MiB)
 @btime GalacticOptim.solve(prob_AD, ADAM(0.1); maxiters=10);
  324.239 ms (564645 allocations: 722.51 MiB)
@btime res_ad = GalacticOptim.solve(prob_AD, BFGS(); maxiters=10);
  1.384 s (2089252 allocations: 2.61 GiB)

# num = 12
@btime GalacticOptim.solve(prob_numeric, ADAM(0.1); maxiters=10);
  21.025 ms (62884 allocations: 24.03 MiB)
@btime  GalacticOptim.solve(prob_numeric, BFGS(); maxiters=10);
  68.583 ms (182490 allocations: 74.26 MiB)
@btime GalacticOptim.solve(prob_AD, ADAM(0.1); maxiters=10);
  1.253 s (1103214 allocations: 2.44 GiB)
@btime res_ad = GalacticOptim.solve(prob_AD, BFGS(); maxiters=10);
  3.476 s (3089135 allocations: 6.83 GiB)

# num = 16
@btime GalacticOptim.solve(prob_numeric, ADAM(0.1); maxiters=10);
  22.563 ms (62884 allocations: 28.65 MiB)
@btime  GalacticOptim.solve(prob_numeric, BFGS(); maxiters=10);
  83.655 ms (188771 allocations: 98.19 MiB)
@btime GalacticOptim.solve(prob_AD, ADAM(0.1); maxiters=10);
  3.008 s (1844114 allocations: 6.05 GiB)
@btime res_ad = GalacticOptim.solve(prob_AD, BFGS(); maxiters=10);
  9.464 s (5532460 allocations: 18.18 GiB)

KirillZubov avatar Jan 04 '22 13:01 KirillZubov

Poisson's equation prediction accuracy automatic / numerical differentiation

num = 5
fastchain = FastChain(FastDense(2,num,Flux.σ),FastDense(num,num,Flux.σ),FastDense(num,1))

initθ = Float64.(DiffEqFlux.initial_params(fastchain))
grid_strategy = NeuralPDE.GridTraining(0.05)
discretization = NeuralPDE.PhysicsInformedNN(fastchain,
                                             grid_strategy;
                                             init_params = initθ,
                                             AD =false)

@named pde_system = PDESystem(eq,bcs,domains,[x,y],[u(x, y)])
prob_numeric = NeuralPDE.discretize(pde_system,discretization)

discretization = NeuralPDE.PhysicsInformedNN(fastchain,
                                             grid_strategy;
                                             init_params = initθ,
                                             AD =true)
prob_AD = NeuralPDE.discretize(pde_system,discretization)

res_num = GalacticOptim.solve(prob_numeric, BFGS();cb=cb, maxiters=2000)
res_ad = GalacticOptim.solve(prob_AD, BFGS();cb=cb, maxiters=2000)
phi = discretization.phi

xs,ys = [infimum(d.domain):0.01:supremum(d.domain) for d in domains]
analytic_sol_func(x,y) = (sin(pi*x)*sin(pi*y))/(2pi^2)

u_predict_num = reshape([first(phi([x,y],res_num.minimizer)) for x in xs for y in ys],(length(xs),length(ys)))
u_predict_ad = reshape([first(phi([x,y],res_ad.minimizer)) for x in xs for y in ys],(length(xs),length(ys)))
u_real = reshape([analytic_sol_func(x,y) for x in xs for y in ys], (length(xs),length(ys)))
diff_u_num = abs.(u_predict_num .- u_real)
diff_u_ad = abs.(u_predict_ad .- u_real)

@test u_predict_num ≈ u_real rtol = 0.02
@test u_predict_ad ≈ u_real rtol = 0.002

p1 = plot(xs, ys, u_real, linetype=:contourf,title = "analytic");
p2 = plot(xs, ys, u_predict_ad, linetype=:contourf,title = "predict_ad");
p3 = plot(xs, ys, u_predict_num, linetype=:contourf,title = "predict_num");
p4 = plot(xs, ys,diff_u_ad,linetype=:contourf,title = "error_ad");
p5 = plot(xs, ys, diff_u_num,linetype=:contourf,title = "error_num");
plot(p1,p2,p3)
plot(p4,p5)

1

2

using Statistics
julia> mean(diff_u_num)
0.0003909686701947168

julia> mean(diff_u_ad)
2.3798038490057497e-5

julia> std(diff_u_num)
0.0003120935609823069

julia> std(diff_u_ad)
1.9645709644550055e-5


@btime res_num = GalacticOptim.solve(prob_numeric, BFGS(); maxiters=2000)
 13.649 s (24797861 allocations: 13.01 GiB)
@btime res_ad = GalacticOptim.solve(prob_AD, BFGS(); maxiters=2000)
 314.281 s (734773650 allocations: 501.37 GiB)

KirillZubov avatar Jan 04 '22 15:01 KirillZubov

problem with opaque_closure

julia>  VERSION
v"1.7.1"

chain = FastChain(FastDense(1,16,Flux.σ),FastDense(16,16,Flux.σ),FastDense(16,1))
initθ = Float64.(DiffEqFlux.initial_params(chain))
phi_ = NeuralPDE.get_phi(chain,DiffEqBase.parameterless_type(initθ))
u_(x,θ_) = phi_(vcat(x),θ_)[1]
u_(1, initθ)
inner_expr =:(x->$u_(x,θ_))
inner_func = @RuntimeGeneratedFunction(inner_expr)
expr = :((x,θ_)->ForwardDiff.derivative($inner_expr,x))
func = @RuntimeGeneratedFunction(expr)

RuntimeGeneratedFunction(#=in Main=#, #=using Main=#, :((x, θ_)->begin
          #= none:1 =#
          ForwardDiff.derivative($(Expr(:opaque_closure, :(x->begin
          #= none:1 =#
          (u_)(x, θ_)
      end))), x)
      end))

func(1,initθ)
ForwardDiff.derivative(x->func(x,initθ),1)

ForwardDiff.gradient(initθ->func(1,initθ),initθ)
ERROR: Cannot determine ordering of Dual tags ForwardDiff.Tag{Core.OpaqueClosure{Tuple{Any}, Any}, Int64} and ForwardDiff.Tag{var"#174#175", Float64}
Stacktrace:
  [1] partials
    @ ~/.julia/packages/ForwardDiff/CkdHU/src/dual.jl:111 [inlined]
  [2] extract_derivative(#unused#::Type{ForwardDiff.Tag{Core.OpaqueClosure{Tuple{Any}, Any}, Int64}}, y::ForwardDiff.Dual{ForwardDiff.Tag{var"#174#175", Float64}, ForwardDiff.Dual{ForwardDiff.Tag{Core.OpaqueClosure{Tuple{Any}, Any}, Int64}, Float64, 1}, 12})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/CkdHU/src/derivative.jl:81
  [3] derivative(f::Core.OpaqueClosure{Tuple{Any}, Any}, x::Int64)
    @ ForwardDiff ~/.julia/packages/ForwardDiff/CkdHU/src/derivative.jl:14
  [4] macro expansion
    @ ./none:1 [inlined]
  [5] macro expansion
    @ ~/.julia/packages/RuntimeGeneratedFunctions/KrkGo/src/RuntimeGeneratedFunctions.jl:129 [inlined]
  [6] macro expansion
    @ ./none:0 [inlined]
  [7] generated_callfunc
    @ ./none:0 [inlined]
  [8] (::RuntimeGeneratedFunction{(:x, :θ_), var"#_RGF_ModTag", var"#_RGF_ModTag", (0x3adef619, 0x6cabbff9, 0x73bfdf49, 0x6187b365, 0xa07ef6b1)})(::Int64, ::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#174#175", Float64}, Float64, 12}})
    @ RuntimeGeneratedFunctions ~/.julia/packages/RuntimeGeneratedFunctions/KrkGo/src/RuntimeGeneratedFunctions.jl:117
  [9] (::var"#174#175")(initθ::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#174#175", Float64}, Float64, 12}})
    @ Main ./none:1
 [10] chunk_mode_gradient(f::var"#174#175", x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#174#175", Float64}, Float64, 12, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#174#175", Float64}, Float64, 12}}})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/CkdHU/src/gradient.jl:150
 [11] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#174#175", Float64}, Float64, 12, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#174#175", Float64}, Float64, 12}}}, ::Val{true})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/CkdHU/src/gradient.jl:21
 [12] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#174#175", Float64}, Float64, 12, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#174#175", Float64}, Float64, 12}}}) (repeats 2 times)
    @ ForwardDiff ~/.julia/packages/ForwardDiff/CkdHU/src/gradient.jl:17
 [13] top-level scope
    @ none:1

KirillZubov avatar Jan 13 '22 12:01 KirillZubov

@ChrisRackauckas do you have any ideas on how to deal with this?

KirillZubov avatar Jan 13 '22 14:01 KirillZubov

Add Dual and Tags for Core.OpaqueClosure?

KirillZubov avatar Jan 13 '22 14:01 KirillZubov

Did you try nesting RGFs?

ChrisRackauckas avatar Jan 15 '22 22:01 ChrisRackauckas

yes, it is the same error

using RuntimeGeneratedFunctions
RuntimeGeneratedFunctions.init(@__MODULE__)

chain = FastChain(FastDense(1,16,Flux.σ),FastDense(16,16,Flux.σ),FastDense(16,1))
initθ = Float64.(DiffEqFlux.initial_params(chain))
phi_ = NeuralPDE.get_phi(chain,DiffEqBase.parameterless_type(initθ))
u_(x,θ_) = phi_(vcat(x),θ_)[1]
u_(1, initθ)

inner_expr =:(θ->(x-> $u_(x,θ))) #nested
inner_func = @RuntimeGeneratedFunction(inner_expr)
# inner_func = eval(inner_expr)
inner_func(initθ)(1)

expr = :((x,θ)->ForwardDiff.derivative($inner_func(θ),x))
func = @RuntimeGeneratedFunction(expr)
# func = eval(expr)
func(1,initθ)

RuntimeGeneratedFunction(#=in Main=#, #=using Main=#, :((x, θ)->begin
          #= none:1 =#
          ForwardDiff.derivative((RuntimeGeneratedFunction{(:θ,), var"#_RGF_ModTag", var"#_RGF_ModTag", (0xd45d54ce, 0x11696e8f, 0x9d485e30, 0x8cfdc8bb, 0xdb9be5d1)}(quote
    #= none:3 =#
    $(Expr(:opaque_closure, :(x->begin
          #= none:3 =#
          (u_)(x, θ)
      end)))
end))(θ), x)
      end))

func.(rand(1,10),Ref(initθ))
ForwardDiff.gradient(x->sum(func.(x,Ref(initθ))),rand(1,10))
ForwardDiff.derivative(x->func(x,initθ),1)
ForwardDiff.gradient(initθ->func(1,initθ),initθ)

ERROR: Cannot determine ordering of Dual tags ForwardDiff.Tag{Core.OpaqueClosure{Tuple{Any}, Any}, Int64} and ForwardDiff.Tag{var"#176#177", Float64}
Stacktrace:
  [1] partials
    @ ~/.julia/packages/ForwardDiff/CkdHU/src/dual.jl:111 [inlined]
  [2] extract_derivative(#unused#::Type{ForwardDiff.Tag{Core.OpaqueClosure{Tuple{Any}, Any}, Int64}}, y::ForwardDiff.Dual{ForwardDiff.Tag{var"#176#177", Float64}, ForwardDiff.Dual{ForwardDiff.Tag{Core.OpaqueClosure{Tuple{Any}, Any}, Int64}, Float64, 1}, 12})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/CkdHU/src/derivative.jl:81
  [3] derivative(f::Core.OpaqueClosure{Tuple{Any}, Any}, x::Int64)
    @ ForwardDiff ~/.julia/packages/ForwardDiff/CkdHU/src/derivative.jl:14
  [4] macro expansion
    @ ./none:1 [inlined]
  [5] macro expansion
    @ ~/.julia/packages/RuntimeGeneratedFunctions/KrkGo/src/RuntimeGeneratedFunctions.jl:129 [inlined]
  [6] macro expansion
    @ ./none:0 [inlined]
  [7] generated_callfunc
    @ ./none:0 [inlined]
  [8] (::RuntimeGeneratedFunction{(:x, :θ), var"#_RGF_ModTag", var"#_RGF_ModTag", (0x65de165a, 0x715b7b1c, 0x390e3fc0, 0x84edfe71, 0xaf23b923)})(::Int64, ::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#176#177", Float64}, Float64, 12}})
    @ RuntimeGeneratedFunctions ~/.julia/packages/RuntimeGeneratedFunctions/KrkGo/src/RuntimeGeneratedFunctions.jl:117
  [9] (::var"#176#177")(initθ::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#176#177", Float64}, Float64, 12}})
    @ Main ./none:1
 [10] chunk_mode_gradient(f::var"#176#177", x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#176#177", Float64}, Float64, 12, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#176#177", Float64}, Float64, 12}}})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/CkdHU/src/gradient.jl:150
 [11] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#176#177", Float64}, Float64, 12, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#176#177", Float64}, Float64, 12}}}, ::Val{true})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/CkdHU/src/gradient.jl:21
 [12] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#176#177", Float64}, Float64, 12, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#176#177", Float64}, Float64, 12}}}) (repeats 2 times)
    @ ForwardDiff ~/.julia/packages/ForwardDiff/CkdHU/src/gradient.jl:17
 [13] top-level scope
    @ none:1

KirillZubov avatar Jan 16 '22 13:01 KirillZubov

https://www.stochasticlifestyle.com/improved-forwarddiff-jl-stacktraces-with-package-tags/ should help

ChrisRackauckas avatar Jan 18 '22 02:01 ChrisRackauckas