NNlib.jl Incorrect gradient of convolution w.r.t. weights

import Random
import NNlib
import NNlib: DenseConvDims


Random.seed!(42);


function ngradient(f, xs::AbstractArray...)
  grads = zero.(xs)
  for (x, Δ) in zip(xs, grads), i in 1:length(x)
    δ = sqrt(eps())
    tmp = x[i]
    x[i] = tmp - δ/2
    y1 = f(xs...)
    x[i] = tmp + δ/2
    y2 = f(xs...)
    x[i] = tmp
    Δ[i] = (y2-y1)/δ
  end
  return grads
end


function conv_loss(x, w)
    cdims = DenseConvDims(x, w; stride=1, padding=0, dilation=1)
    y = NNlib.conv(x, w, cdims)
    return sum(y)
end


x = rand(7, 7, 3, 10); w = rand(3, 3, 3, 1)
cdims = DenseConvDims(x, w; stride=1, padding=0, dilation=1)
y = NNlib.conv(x, w, cdims)
dy = ones(size(y))
    
ndx, ndw = ngradient(conv_loss, x, w)

dx = NNlib.∇conv_data(dy, w, cdims)
dw = NNlib.∇conv_filter(x, dy, cdims)

isapprox(dx, ndx, rtol=1e-5, atol=1e-5)  # true
isapprox(dw, ndw, rtol=1e-5, atol=1e-5)  # false

I recently updated NNlib from (I think) version 0.6.0 to the latest version 0.6.6 and my tests started to fail. NNlib.∇conv_filter() differs from both - numeric approximation (ngradient) from CUDNN implementation, and the difference is quite huge (like [123, 128, ...] vs [112, 115, ...]), so it's not about numeric instability.

I tried to track back NNlib implementation, but huge portions of code were changed, including tests. Does anybody know what may have caused this issue?

Apr 28 '20 16:04 dfdx

@dfdx for me the example script works fine on NNlib v0.6.6. Could you check if the problem still persist for you?

May 10 '20 06:05 CarloLucibello

@CarloLucibello In your case, does the code work and show true in both 2 lines?

I've just re-tested it and the problem still presents:

(@v1.4) pkg> st NNlib
Status `~/.julia/environments/v1.4/Project.toml`
  [872c559c] NNlib v0.6.6

julia> import Random

julia> import NNlib

julia> import NNlib: DenseConvDims

julia> Random.seed!(42);

julia> function ngradient(f, xs::AbstractArray...)
         grads = zero.(xs)
         for (x, Δ) in zip(xs, grads), i in 1:length(x)
           δ = sqrt(eps())
           tmp = x[i]
           x[i] = tmp - δ/2
           y1 = f(xs...)
           x[i] = tmp + δ/2
           y2 = f(xs...)
           x[i] = tmp
           Δ[i] = (y2-y1)/δ
         end
         return grads
       end
ngradient (generic function with 1 method)

julia> function conv_loss(x, w)
           cdims = DenseConvDims(x, w; stride=1, padding=0, dilation=1)
           y = NNlib.conv(x, w, cdims)
           return sum(y)
       end
conv_loss (generic function with 1 method)

julia> x = rand(7, 7, 3, 10); w = rand(3, 3, 3, 1);

julia> cdims = DenseConvDims(x, w; stride=1, padding=0, dilation=1);

julia> y = NNlib.conv(x, w, cdims);

julia> dy = ones(size(y));

julia> ndx, ndw = ngradient(conv_loss, x, w);

julia> dx = NNlib.∇conv_data(dy, w, cdims);

julia> dw = NNlib.∇conv_filter(x, dy, cdims);

julia> isapprox(dx, ndx, rtol=1e-5, atol=1e-5)  # true
true

julia> isapprox(dw, ndw, rtol=1e-5, atol=1e-5)  # false
false  # <--- this should be true

May 10 '20 07:05 dfdx

@CarloLucibello In your case, does the code work and show true in both 2 lines?

yes, I get true in both lines. weird

May 10 '20 08:05 CarloLucibello

Do you know if it can depend on optional dependencies like NNPACK or something?

May 10 '20 08:05 dfdx

NNPACK looks like the only possible culprit. I don't have it

julia> NNlib.is_nnpack_available()
false

May 10 '20 08:05 CarloLucibello

Hm, for me it's also false.

Unless there's a better idea, I'll try it on a fresh installation of Julia later today, maybe another OS or something.

May 10 '20 08:05 dfdx

Freshly installed Julia 1.4.1 on MacOS:

ok

Re-created .julia directory from Julia on Linux Mint:

broken

Freshly installed Julia on Ubuntu 20.04:

ok

So it's definitely in my environment, but given that NNlib is all Julia (except for NNPACK, which reportedly isn't used) and Julia code is identical for all three cases, I don't really see where the difference may come from.

Are there any other dependencies on system libraries I'm missing?

May 10 '20 22:05 dfdx

If I activate NNPACK (by setting ENV["NNLIB_USE_NNPACK"] = "true") and rebuild NNlib, results are consistent with numeric gradient:

Without NNPACK:

julia> dw[1:10]   # NNlib's gradient
10-element Array{Float64,1}:
 77.24267625974863
 75.60615480391128
 75.97528738512742
 75.49146304275004
 73.38170666437483
 74.63221240244117
 71.85548233531136
 68.32006036928726
 69.4662272188591
 72.98388963724767

julia> ndw[1:10]    # numeric gradient
10-element Array{Float64,1}:
 123.31095886230469
 122.17422485351562
 124.30633544921875
 122.17955017089844
 121.91676330566406
 125.45509338378906
 119.04501342773438
 115.77719116210938
 116.76512145996094
 122.80245971679688

With NNPACK:

julia> dw[1:10]    # NNlib's gradient via NNPACK
10-element Array{Float64,1}:
 111.8546990507544
 108.91015741296108
 109.89558792982777
 109.61416120681227
 108.17139630935685
 110.89449325824589
 106.75714769997535
 103.4189106574869
 116.76510746200394
 122.80246892879077

Unfortunately, NNPACK performs Float64 -> Float32 -> Float64 conversion which destroys numeric gradient turning it mostly to zeros, and so cannot be used in tests.

May 10 '20 22:05 dfdx

It seems like the default backend for ∇conv_filter - ∇conv_filter_im2cal - has an element of randomness, which breaks the result sometimes. Here's a piece of REPL session, note how values are changing (1st line of each output should be enough):

# wrong
julia> dw = NNlib.∇conv_filter_im2col(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 99.2536   98.4235  106.757
 98.761   108.171   103.419
 99.9019  110.894   104.214

[:, :, 2, 1] =
 105.888  108.302  109.741
 105.845  107.46   114.824
 106.473  106.715  111.629

[:, :, 3, 1] =
 117.77   130.985  129.319
 115.742  130.63   129.331
 127.412  126.583   87.4029

# correct
julia> dw = NNlib.∇conv_filter_im2col(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 123.311  122.18   119.045
 122.174  121.917  115.777
 124.306  125.455  116.765

[:, :, 2, 1] =
 122.802  125.232  125.662
 122.327  124.453  115.174
 121.945  123.568  113.199

[:, :, 3, 1] =
 118.697  118.931  129.319
 117.099  118.254  129.331
 116.372  115.729  124.711

# correct
julia> dw = NNlib.∇conv_filter_im2col(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 123.311  122.18   119.045
 122.174  121.917  115.777
 124.306  125.455  116.765

[:, :, 2, 1] =
 122.802  125.232  125.662
 122.327  124.453  127.583
 121.945  123.568  125.158

[:, :, 3, 1] =
 130.69   130.985  129.319
 129.337  130.63   129.331
 127.412  126.583  124.711

# correct
julia> dw = NNlib.∇conv_filter_im2col(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 123.311  122.18   119.045
 122.174  121.917  115.777
 124.306  125.455  116.765

[:, :, 2, 1] =
 122.802  125.232  125.662
 122.327  124.453  127.583
 121.945  123.568  125.158

[:, :, 3, 1] =
 130.69   130.985  129.319
 129.337  130.63   129.331
 127.412  126.583  124.711

# wrong, different from the first case
julia> dw = NNlib.∇conv_filter_im2col(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 87.7973  85.8581  84.9976
 85.497   84.262   81.9135
 85.4911  84.5851  81.1097

[:, :, 2, 1] =
 81.0013  82.4951  84.56
 81.72    82.6341  98.9791
 80.7205  80.6145  96.6074

[:, :, 3, 1] =
 106.067  106.032  115.456
 106.388  107.386  116.069
 105.687  103.338  111.824

The good news is that another backend - ∇conv_filter_direct - works consistently and correctly:

julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 123.311  122.18   119.045
 122.174  121.917  115.777
 124.306  125.455  116.765

[:, :, 2, 1] =
 122.802  125.232  125.662
 122.327  124.453  127.583
 121.945  123.568  125.158

[:, :, 3, 1] =
 130.69   130.985  129.319
 129.337  130.63   129.331
 127.412  126.583  124.711

julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 123.311  122.18   119.045
 122.174  121.917  115.777
 124.306  125.455  116.765

[:, :, 2, 1] =
 122.802  125.232  125.662
 122.327  124.453  127.583
 121.945  123.568  125.158

[:, :, 3, 1] =
 130.69   130.985  129.319
 129.337  130.63   129.331
 127.412  126.583  124.711

julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 123.311  122.18   119.045
 122.174  121.917  115.777
 124.306  125.455  116.765

[:, :, 2, 1] =
 122.802  125.232  125.662
 122.327  124.453  127.583
 121.945  123.568  125.158

[:, :, 3, 1] =
 130.69   130.985  129.319
 129.337  130.63   129.331
 127.412  126.583  124.711

julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 123.311  122.18   119.045
 122.174  121.917  115.777
 124.306  125.455  116.765

[:, :, 2, 1] =
 122.802  125.232  125.662
 122.327  124.453  127.583
 121.945  123.568  125.158

[:, :, 3, 1] =
 130.69   130.985  129.319
 129.337  130.63   129.331
 127.412  126.583  124.711

julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 123.311  122.18   119.045
 122.174  121.917  115.777
 124.306  125.455  116.765

[:, :, 2, 1] =
 122.802  125.232  125.662
 122.327  124.453  127.583
 121.945  123.568  125.158

[:, :, 3, 1] =
 130.69   130.985  129.319
 129.337  130.63   129.331
 127.412  126.583  124.711

julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 123.311  122.18   119.045
 122.174  121.917  115.777
 124.306  125.455  116.765

[:, :, 2, 1] =
 122.802  125.232  125.662
 122.327  124.453  127.583
 121.945  123.568  125.158

[:, :, 3, 1] =
 130.69   130.985  129.319
 129.337  130.63   129.331
 127.412  126.583  124.711

julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 123.311  122.18   119.045
 122.174  121.917  115.777
 124.306  125.455  116.765

[:, :, 2, 1] =
 122.802  125.232  125.662
 122.327  124.453  127.583
 121.945  123.568  125.158

[:, :, 3, 1] =
 130.69   130.985  129.319
 129.337  130.63   129.331
 127.412  126.583  124.711

I'm not in a good position to debug implementation of conv_filter_im2col(), so in my library I just switched to the _direct version. Hopefully someone with better knowledge of im2col will be able to take a look at its implementation.

May 18 '20 22:05 dfdx

It seems like the default backend for ∇conv_filter - ∇conv_filter_im2cal - has an element of randomness, which breaks the result sometimes.

Not sure if this issue is related. I trained a model and found that even though I had set the seed of RNG, the gradient is not the same occasionally with convolution (it's kind of difficult for me to work out a MWE yet). And this breaks reproducibility sometimes.

Jul 27 '20 06:07 findmyway

Looks like this was closed by #235 and just not updated? Feel free to re-open if that's not the case.

Sep 04 '22 21:09 ToucheSir

NNlib.jl NNlib.jl copied to clipboard

Incorrect gradient of convolution w.r.t. weights

NNlib.jl
NNlib.jl copied to clipboard