Querying number of distance evaluations
Is there a way to determine the number of distance calculations performed during the kNN computation? The brute-force approach requires evaluating the distances between every query and corpus point, resulting in $n_q \times n_c$ calculations. However, methods like KDTree or BallTree can potentially reduce this number, especially in low-dimensional spaces.
Does the package provide a mechanism to retrieve the actual number of distance evaluations during the kNN computation?
Thank you!
That's a good idea!
There used to be a "debug" mode that you could activate that added some stats but it was removed in https://github.com/KristofferC/NearestNeighbors.jl/pull/98/files#diff-3d112593fe19c53f85f19846cc9d0406c17b39736378b65e7d12821d20207302.
With more experienced eyes, I think a better alternative is to allow one to pass something like a
struct EvaluationData
n_distance_evals
n_tree_intersection_evals
...
end
to inrange and knn and have that struct be populated. By default that argument would just be nothing and unused.
I made a minor modification on my fork to include a global counter:
https://github.com/KristofferC/NearestNeighbors.jl/compare/master...fcdimitr:NearestNeighbors.jl:master
It works correctly, and some preliminary benchmarking shows that it does not affect the running time.
In any case, I agree that a more structured approach to collecting statistics should be preferred.
Another thing that can be used is GFlops.jl (requires the https://github.com/charleskawczynski/GFlops.jl/tree/ck/julia1.10 branch).
julia> tree = BallTree(rand(3,10^5));
julia> v = rand(3)
3-element Vector{Float64}:
0.38248598898980024
0.6378296367207168
0.9255061078525778
julia> @count_ops knn(tree, v, 4)
Flop Counter: 9124 flop
┌──────┬─────────┐
│ │ Float64 │
├──────┼─────────┤
│ add │ 2184 │
│ sub │ 3300 │
│ mul │ 2912 │
│ sqrt │ 728 │
└──────┴─────────┘