LoopVectorization.jl icon indicating copy to clipboard operation
LoopVectorization.jl copied to clipboard

Bug in histogramming

Open IlianPihlajamaa opened this issue 4 years ago • 2 comments
trafficstars

Today I think I found a bug. See the minimal example below. I am essentially trying to compute a pairwise distance histogram. I have an array with 3D- particle positions of size 3xNxN_timesteps, in which 3 is the number of dimensions, N is the number of particles, and N_timesteps is the number of independent sets of particle positions.

To calculate the histogram I am doing a double loop over all particles for each time step. In each iteration, I calculate the distance between particle 1 and particle 2, and add the binned result to an integer array h. In cases with distances larger than those that interest me and for cases of distances between a particle and itself I add to h[1].

Thanks for looking at this!

using LoopVectorization, IfElse

function pair_hist(r, Nbins, box_size)
    Ndim, N, N_timesteps = size(r)
    bin_edges = collect(LinRange(0.0, 5.0, Nbins+1))
    binsize = bin_edges[2] - bin_edges[1]
    h = zeros(Int64, Nbins)
    for t = 1:N_timesteps
       for particle1 = 1:N
            for particle2 = 1:N
               # pair distances
               dx = r[1, particle1, t] - r[1, particle2, t]
               dy = r[2, particle1, t] - r[2, particle2, t]
               dz = r[3, particle1, t] - r[3, particle2, t]
               # distance
               dr = sqrt(dx^2+dy^2+dz^2)
               # Find the index for binning
               index = ceil(Int64, dr / binsize)
               # if index=0 (distance of 0, meaning particle1==particle2)
               # and if index > Nbins, (outside of our interest),
               # we add it to the first element of h (which we can later throw away)
               index2 = ifelse(0 < index <= Nbins, index, 1)
               h[index2] += 1
           end
       end
   end
   return h
end

function pair_hist_turbo(r, Nbins, box_size)
    Ndim, N, N_timesteps = size(r)
    bin_edges = collect(LinRange(0.0, 5.0, Nbins+1))
    binsize = bin_edges[2] - bin_edges[1]
    h = zeros(Int64, Nbins)
    @turbo for t = 1:N_timesteps
       for particle1 = 1:N
            for particle2 = 1:N
               # pair distances
               dx = r[1, particle1, t] - r[1, particle2, t]
               dy = r[2, particle1, t] - r[2, particle2, t]
               dz = r[3, particle1, t] - r[3, particle2, t]
               # distance
               dr = sqrt(dx^2+dy^2+dz^2)
               # Find the index for binning
               index = ceil(Int64, dr / binsize)
               # if index=0 (distance of 0, meaning particle1==particle2)
               # and if index > Nbins, (outside of our interest),
               # we add it to the first element of h (which we can later throw away)
               index2 = ifelse(0 < index <= Nbins, index, 1)
               h[index2] += 1
           end
       end
   end
   return h
end

function pair_hist_tturbo(r, Nbins, box_size)
    Ndim, N, N_timesteps = size(r)
    bin_edges = collect(LinRange(0.0, 5.0, Nbins+1))
    binsize = bin_edges[2] - bin_edges[1]
    h = zeros(Int64, Nbins)
    @tturbo for t = 1:N_timesteps
       for particle1 = 1:N
            for particle2 = 1:N
               # pair distances
               dx = r[1, particle1, t] - r[1, particle2, t]
               dy = r[2, particle1, t] - r[2, particle2, t]
               dz = r[3, particle1, t] - r[3, particle2, t]
               # distance
               dr = sqrt(dx^2+dy^2+dz^2)
               # Find the index for binning
               index = ceil(Int64, dr / binsize)
               # if index=0 (distance of 0, meaning particle1==particle2)
               # and if index > Nbins, (outside of our interest),
               # we add it to the first element of h (which we can later throw away)
               index2 = ifelse(0 < index <= Nbins, index, 1)
               h[index2] += 1
           end
       end
   end
   return h
end

box_size = 10.0
r = rand(3, 1000, 10)*box_size
a = pair_hist(r, 100, box_size)
b = pair_hist_turbo(r, 100, box_size)
c = pair_hist_tturbo(r, 100, box_size)
println(sum(a)) # 10000000 (correct)
println(sum(b)) # 5266141 (not good)
println(sum(c)) # 2918555 (not good)

IlianPihlajamaa avatar Nov 10 '21 17:11 IlianPihlajamaa

This is hard to solve, because LV works in parallel. Thus if it tries to read, increment, and write to the same index of h in parallel, the answer will be wrong. https://github.com/JuliaSIMD/LoopVectorization.jl/issues/325

There are potential fixes, but they'd be tricky to implement. For the moment, I want to prioritize rewriting LoopVectorization. I may be able to solve this after that.

chriselrod avatar Nov 10 '21 19:11 chriselrod

Alright, I expected something like this.

Thanks anyway for a great package!

IlianPihlajamaa avatar Nov 10 '21 20:11 IlianPihlajamaa