NNlib.jl
NNlib.jl copied to clipboard
Incorrect gradient of convolution w.r.t. weights
import Random
import NNlib
import NNlib: DenseConvDims
Random.seed!(42);
function ngradient(f, xs::AbstractArray...)
grads = zero.(xs)
for (x, Δ) in zip(xs, grads), i in 1:length(x)
δ = sqrt(eps())
tmp = x[i]
x[i] = tmp - δ/2
y1 = f(xs...)
x[i] = tmp + δ/2
y2 = f(xs...)
x[i] = tmp
Δ[i] = (y2-y1)/δ
end
return grads
end
function conv_loss(x, w)
cdims = DenseConvDims(x, w; stride=1, padding=0, dilation=1)
y = NNlib.conv(x, w, cdims)
return sum(y)
end
x = rand(7, 7, 3, 10); w = rand(3, 3, 3, 1)
cdims = DenseConvDims(x, w; stride=1, padding=0, dilation=1)
y = NNlib.conv(x, w, cdims)
dy = ones(size(y))
ndx, ndw = ngradient(conv_loss, x, w)
dx = NNlib.∇conv_data(dy, w, cdims)
dw = NNlib.∇conv_filter(x, dy, cdims)
isapprox(dx, ndx, rtol=1e-5, atol=1e-5) # true
isapprox(dw, ndw, rtol=1e-5, atol=1e-5) # false
I recently updated NNlib from (I think) version 0.6.0 to the latest version 0.6.6 and my tests started to fail. NNlib.∇conv_filter()
differs from both - numeric approximation (ngradient
) from CUDNN implementation, and the difference is quite huge (like [123, 128, ...]
vs [112, 115, ...]
), so it's not about numeric instability.
I tried to track back NNlib implementation, but huge portions of code were changed, including tests. Does anybody know what may have caused this issue?
@dfdx for me the example script works fine on NNlib v0.6.6. Could you check if the problem still persist for you?
@CarloLucibello In your case, does the code work and show true
in both 2 lines?
I've just re-tested it and the problem still presents:
(@v1.4) pkg> st NNlib
Status `~/.julia/environments/v1.4/Project.toml`
[872c559c] NNlib v0.6.6
julia> import Random
julia> import NNlib
julia> import NNlib: DenseConvDims
julia> Random.seed!(42);
julia> function ngradient(f, xs::AbstractArray...)
grads = zero.(xs)
for (x, Δ) in zip(xs, grads), i in 1:length(x)
δ = sqrt(eps())
tmp = x[i]
x[i] = tmp - δ/2
y1 = f(xs...)
x[i] = tmp + δ/2
y2 = f(xs...)
x[i] = tmp
Δ[i] = (y2-y1)/δ
end
return grads
end
ngradient (generic function with 1 method)
julia> function conv_loss(x, w)
cdims = DenseConvDims(x, w; stride=1, padding=0, dilation=1)
y = NNlib.conv(x, w, cdims)
return sum(y)
end
conv_loss (generic function with 1 method)
julia> x = rand(7, 7, 3, 10); w = rand(3, 3, 3, 1);
julia> cdims = DenseConvDims(x, w; stride=1, padding=0, dilation=1);
julia> y = NNlib.conv(x, w, cdims);
julia> dy = ones(size(y));
julia> ndx, ndw = ngradient(conv_loss, x, w);
julia> dx = NNlib.∇conv_data(dy, w, cdims);
julia> dw = NNlib.∇conv_filter(x, dy, cdims);
julia> isapprox(dx, ndx, rtol=1e-5, atol=1e-5) # true
true
julia> isapprox(dw, ndw, rtol=1e-5, atol=1e-5) # false
false # <--- this should be true
@CarloLucibello In your case, does the code work and show
true
in both 2 lines?
yes, I get true
in both lines. weird
Do you know if it can depend on optional dependencies like NNPACK or something?
NNPACK looks like the only possible culprit. I don't have it
julia> NNlib.is_nnpack_available()
false
Hm, for me it's also false
.
Unless there's a better idea, I'll try it on a fresh installation of Julia later today, maybe another OS or something.
Freshly installed Julia 1.4.1 on MacOS:
- ok
Re-created .julia
directory from Julia on Linux Mint:
- broken
Freshly installed Julia on Ubuntu 20.04:
- ok
So it's definitely in my environment, but given that NNlib is all Julia (except for NNPACK, which reportedly isn't used) and Julia code is identical for all three cases, I don't really see where the difference may come from.
Are there any other dependencies on system libraries I'm missing?
If I activate NNPACK (by setting ENV["NNLIB_USE_NNPACK"] = "true"
) and rebuild NNlib, results are consistent with numeric gradient:
Without NNPACK:
julia> dw[1:10] # NNlib's gradient
10-element Array{Float64,1}:
77.24267625974863
75.60615480391128
75.97528738512742
75.49146304275004
73.38170666437483
74.63221240244117
71.85548233531136
68.32006036928726
69.4662272188591
72.98388963724767
julia> ndw[1:10] # numeric gradient
10-element Array{Float64,1}:
123.31095886230469
122.17422485351562
124.30633544921875
122.17955017089844
121.91676330566406
125.45509338378906
119.04501342773438
115.77719116210938
116.76512145996094
122.80245971679688
With NNPACK:
julia> dw[1:10] # NNlib's gradient via NNPACK
10-element Array{Float64,1}:
111.8546990507544
108.91015741296108
109.89558792982777
109.61416120681227
108.17139630935685
110.89449325824589
106.75714769997535
103.4189106574869
116.76510746200394
122.80246892879077
Unfortunately, NNPACK performs Float64 -> Float32 -> Float64 conversion which destroys numeric gradient turning it mostly to zeros, and so cannot be used in tests.
It seems like the default backend for ∇conv_filter
- ∇conv_filter_im2cal
- has an element of randomness, which breaks the result sometimes. Here's a piece of REPL session, note how values are changing (1st line of each output should be enough):
# wrong
julia> dw = NNlib.∇conv_filter_im2col(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
99.2536 98.4235 106.757
98.761 108.171 103.419
99.9019 110.894 104.214
[:, :, 2, 1] =
105.888 108.302 109.741
105.845 107.46 114.824
106.473 106.715 111.629
[:, :, 3, 1] =
117.77 130.985 129.319
115.742 130.63 129.331
127.412 126.583 87.4029
# correct
julia> dw = NNlib.∇conv_filter_im2col(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
123.311 122.18 119.045
122.174 121.917 115.777
124.306 125.455 116.765
[:, :, 2, 1] =
122.802 125.232 125.662
122.327 124.453 115.174
121.945 123.568 113.199
[:, :, 3, 1] =
118.697 118.931 129.319
117.099 118.254 129.331
116.372 115.729 124.711
# correct
julia> dw = NNlib.∇conv_filter_im2col(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
123.311 122.18 119.045
122.174 121.917 115.777
124.306 125.455 116.765
[:, :, 2, 1] =
122.802 125.232 125.662
122.327 124.453 127.583
121.945 123.568 125.158
[:, :, 3, 1] =
130.69 130.985 129.319
129.337 130.63 129.331
127.412 126.583 124.711
# correct
julia> dw = NNlib.∇conv_filter_im2col(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
123.311 122.18 119.045
122.174 121.917 115.777
124.306 125.455 116.765
[:, :, 2, 1] =
122.802 125.232 125.662
122.327 124.453 127.583
121.945 123.568 125.158
[:, :, 3, 1] =
130.69 130.985 129.319
129.337 130.63 129.331
127.412 126.583 124.711
# wrong, different from the first case
julia> dw = NNlib.∇conv_filter_im2col(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
87.7973 85.8581 84.9976
85.497 84.262 81.9135
85.4911 84.5851 81.1097
[:, :, 2, 1] =
81.0013 82.4951 84.56
81.72 82.6341 98.9791
80.7205 80.6145 96.6074
[:, :, 3, 1] =
106.067 106.032 115.456
106.388 107.386 116.069
105.687 103.338 111.824
The good news is that another backend - ∇conv_filter_direct
- works consistently and correctly:
julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
123.311 122.18 119.045
122.174 121.917 115.777
124.306 125.455 116.765
[:, :, 2, 1] =
122.802 125.232 125.662
122.327 124.453 127.583
121.945 123.568 125.158
[:, :, 3, 1] =
130.69 130.985 129.319
129.337 130.63 129.331
127.412 126.583 124.711
julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
123.311 122.18 119.045
122.174 121.917 115.777
124.306 125.455 116.765
[:, :, 2, 1] =
122.802 125.232 125.662
122.327 124.453 127.583
121.945 123.568 125.158
[:, :, 3, 1] =
130.69 130.985 129.319
129.337 130.63 129.331
127.412 126.583 124.711
julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
123.311 122.18 119.045
122.174 121.917 115.777
124.306 125.455 116.765
[:, :, 2, 1] =
122.802 125.232 125.662
122.327 124.453 127.583
121.945 123.568 125.158
[:, :, 3, 1] =
130.69 130.985 129.319
129.337 130.63 129.331
127.412 126.583 124.711
julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
123.311 122.18 119.045
122.174 121.917 115.777
124.306 125.455 116.765
[:, :, 2, 1] =
122.802 125.232 125.662
122.327 124.453 127.583
121.945 123.568 125.158
[:, :, 3, 1] =
130.69 130.985 129.319
129.337 130.63 129.331
127.412 126.583 124.711
julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
123.311 122.18 119.045
122.174 121.917 115.777
124.306 125.455 116.765
[:, :, 2, 1] =
122.802 125.232 125.662
122.327 124.453 127.583
121.945 123.568 125.158
[:, :, 3, 1] =
130.69 130.985 129.319
129.337 130.63 129.331
127.412 126.583 124.711
julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
123.311 122.18 119.045
122.174 121.917 115.777
124.306 125.455 116.765
[:, :, 2, 1] =
122.802 125.232 125.662
122.327 124.453 127.583
121.945 123.568 125.158
[:, :, 3, 1] =
130.69 130.985 129.319
129.337 130.63 129.331
127.412 126.583 124.711
julia> dw = NNlib.∇conv_filter_direct(x, dy, cdims)
3×3×3×1 Array{Float64,4}:
[:, :, 1, 1] =
123.311 122.18 119.045
122.174 121.917 115.777
124.306 125.455 116.765
[:, :, 2, 1] =
122.802 125.232 125.662
122.327 124.453 127.583
121.945 123.568 125.158
[:, :, 3, 1] =
130.69 130.985 129.319
129.337 130.63 129.331
127.412 126.583 124.711
I'm not in a good position to debug implementation of conv_filter_im2col()
, so in my library I just switched to the _direct
version. Hopefully someone with better knowledge of im2col will be able to take a look at its implementation.
It seems like the default backend for ∇conv_filter - ∇conv_filter_im2cal - has an element of randomness, which breaks the result sometimes.
Not sure if this issue is related. I trained a model and found that even though I had set the seed of RNG, the gradient is not the same occasionally with convolution (it's kind of difficult for me to work out a MWE yet). And this breaks reproducibility sometimes.
Looks like this was closed by #235 and just not updated? Feel free to re-open if that's not the case.