LoopVectorization.jl
LoopVectorization.jl copied to clipboard
Bug in histogramming
Today I think I found a bug. See the minimal example below. I am essentially trying to compute a pairwise distance histogram. I have an array with 3D- particle positions of size 3xNxN_timesteps, in which 3 is the number of dimensions, N is the number of particles, and N_timesteps is the number of independent sets of particle positions.
To calculate the histogram I am doing a double loop over all particles for each time step. In each iteration, I calculate the distance between particle 1 and particle 2, and add the binned result to an integer array h. In cases with distances larger than those that interest me and for cases of distances between a particle and itself I add to h[1].
Thanks for looking at this!
using LoopVectorization, IfElse
function pair_hist(r, Nbins, box_size)
Ndim, N, N_timesteps = size(r)
bin_edges = collect(LinRange(0.0, 5.0, Nbins+1))
binsize = bin_edges[2] - bin_edges[1]
h = zeros(Int64, Nbins)
for t = 1:N_timesteps
for particle1 = 1:N
for particle2 = 1:N
# pair distances
dx = r[1, particle1, t] - r[1, particle2, t]
dy = r[2, particle1, t] - r[2, particle2, t]
dz = r[3, particle1, t] - r[3, particle2, t]
# distance
dr = sqrt(dx^2+dy^2+dz^2)
# Find the index for binning
index = ceil(Int64, dr / binsize)
# if index=0 (distance of 0, meaning particle1==particle2)
# and if index > Nbins, (outside of our interest),
# we add it to the first element of h (which we can later throw away)
index2 = ifelse(0 < index <= Nbins, index, 1)
h[index2] += 1
end
end
end
return h
end
function pair_hist_turbo(r, Nbins, box_size)
Ndim, N, N_timesteps = size(r)
bin_edges = collect(LinRange(0.0, 5.0, Nbins+1))
binsize = bin_edges[2] - bin_edges[1]
h = zeros(Int64, Nbins)
@turbo for t = 1:N_timesteps
for particle1 = 1:N
for particle2 = 1:N
# pair distances
dx = r[1, particle1, t] - r[1, particle2, t]
dy = r[2, particle1, t] - r[2, particle2, t]
dz = r[3, particle1, t] - r[3, particle2, t]
# distance
dr = sqrt(dx^2+dy^2+dz^2)
# Find the index for binning
index = ceil(Int64, dr / binsize)
# if index=0 (distance of 0, meaning particle1==particle2)
# and if index > Nbins, (outside of our interest),
# we add it to the first element of h (which we can later throw away)
index2 = ifelse(0 < index <= Nbins, index, 1)
h[index2] += 1
end
end
end
return h
end
function pair_hist_tturbo(r, Nbins, box_size)
Ndim, N, N_timesteps = size(r)
bin_edges = collect(LinRange(0.0, 5.0, Nbins+1))
binsize = bin_edges[2] - bin_edges[1]
h = zeros(Int64, Nbins)
@tturbo for t = 1:N_timesteps
for particle1 = 1:N
for particle2 = 1:N
# pair distances
dx = r[1, particle1, t] - r[1, particle2, t]
dy = r[2, particle1, t] - r[2, particle2, t]
dz = r[3, particle1, t] - r[3, particle2, t]
# distance
dr = sqrt(dx^2+dy^2+dz^2)
# Find the index for binning
index = ceil(Int64, dr / binsize)
# if index=0 (distance of 0, meaning particle1==particle2)
# and if index > Nbins, (outside of our interest),
# we add it to the first element of h (which we can later throw away)
index2 = ifelse(0 < index <= Nbins, index, 1)
h[index2] += 1
end
end
end
return h
end
box_size = 10.0
r = rand(3, 1000, 10)*box_size
a = pair_hist(r, 100, box_size)
b = pair_hist_turbo(r, 100, box_size)
c = pair_hist_tturbo(r, 100, box_size)
println(sum(a)) # 10000000 (correct)
println(sum(b)) # 5266141 (not good)
println(sum(c)) # 2918555 (not good)
This is hard to solve, because LV works in parallel. Thus if it tries to read, increment, and write to the same index of h in parallel, the answer will be wrong.
https://github.com/JuliaSIMD/LoopVectorization.jl/issues/325
There are potential fixes, but they'd be tricky to implement. For the moment, I want to prioritize rewriting LoopVectorization. I may be able to solve this after that.
Alright, I expected something like this.
Thanks anyway for a great package!