GPUArrays.jl icon indicating copy to clipboard operation
GPUArrays.jl copied to clipboard

Fix rounding problems in _mean function

Open gabrielpreviato opened this issue 2 years ago • 4 comments

As stated in https://github.com/JuliaGPU/CUDA.jl/issues/1773, with the current _mean function, for bigger arrays you get some rounding problems.

julia> A = CUDA.ones((640, 640, 32, 1))
640×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:

julia> T = float(eltype(A))
Float32

julia> λ = convert(T, inv(_mean_denom(A, dims)))
0.0015625f0

julia> sum(Base.Fix1(*,λ), A; dims)
1×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 2, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 3, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

;;; …

[:, :, 30, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 31, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 32, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

This PR changes the order of the multiplication operation, once multiplying them before summing can cause some loss of precision with smaller numbers.

julia> A = CUDA.ones((640, 640, 32, 1))
640×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:

julia> T = float(eltype(A))
Float32

julia> λ = convert(T, inv(_mean_denom(A, dims)))
0.0015625f0

julia> sum(A; dims) .* λ
1×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

[:, :, 2, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

[:, :, 3, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

;;; …

[:, :, 30, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

[:, :, 31, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

[:, :, 32, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

Performance wise I can't say the difference it may cause (I'm not an expert on Julia operation performance). If anyone has ideas how to test it, I'm open to learn and try.

gabrielpreviato avatar Feb 14 '23 00:02 gabrielpreviato