NearestNeighbors.jl icon indicating copy to clipboard operation
NearestNeighbors.jl copied to clipboard

use ArrayOfArrays for return value to reduce the number of allocated arrays

Open KristofferC opened this issue 1 year ago • 4 comments

A long standing gripe for me has been that indices and distances are returned as standard nested Arrays. Typically, each inner array hold quite a small number of neighbors so it means that we allocate a large number of small arrays.

Using ArrayOfArrays, these are stored contigously in one large flat array instead.

The difference in allocations can be readily seen:

julia> input = rand(3, 10^6);

julia> tree = KDTree(rand(3, 10^6));

julia> @time knn(tree, input, 5);
  1.538003 seconds (2.00 M allocations: 221.253 MiB, 10.03% gc time)

julia> @time knn(tree, input, 5);
  1.489310 seconds (98 allocations: 189.884 MiB, 0.29% gc time)

KristofferC avatar Nov 22 '23 09:11 KristofferC

One potential issue with this is that getindex is slower on AoA so depending on how people use the return value they might encounter performance regressions.

KristofferC avatar Nov 22 '23 10:11 KristofferC

@nanosoldier runtests()

KristofferC avatar Jun 13 '24 12:06 KristofferC

Update on PkgEvalJob KristofferC/NearestNeighbors.jl@048c5a7 vs. KristofferC/NearestNeighbors.jl@bc645d1: Accepted

nanosoldier avatar Jun 13 '24 12:06 nanosoldier

Update on PkgEvalJob KristofferC/NearestNeighbors.jl@048c5a7 vs. KristofferC/NearestNeighbors.jl@bc645d1: Running

nanosoldier avatar Jun 14 '24 03:06 nanosoldier

The package evaluation job you requested has completed - possible new issues were detected. The full report is available.

nanosoldier avatar Jun 14 '24 04:06 nanosoldier