NearestNeighbors.jl icon indicating copy to clipboard operation
NearestNeighbors.jl copied to clipboard

Trees for non-Metrics?

Open oxinabox opened this issue 6 years ago • 1 comments

For NLP it is common to want to use CosineDist, which is a SemiMetric.

This is not going to be compatible with the BallTree, I think.

but it should be fine with the BruteTree.

oxinabox avatar Sep 28 '18 10:09 oxinabox

This would come in handy for Clustering.jl, what do you think @KristofferC ? Currently cluster assigment is performed computing and storing all pairwise distances, which is quite bad in terms of memory (and it ends up beeing slower as well), it would be nice to use a BruteTree to get cluster assigments. Something similar to

using NearestNeighbors

function get_cluster_assignments_nearest_neighbors(
   X::Matrix{T}, 
   centers::Matrix{T}, 
   distance::SemiMetric=SqEuclidean(),       # in: function to calculate distance with
   ) where {F<:Function,T}

   brutetree = BruteTree(centers, distance)
   idx, distances = knn(brutetree, X, 1) 
   
   return idx
end

I asked for this in this PR https://github.com/JuliaStats/Clustering.jl/pull/238 but the idea was to leverage something like Distances.jl or NearestNeighbors and not implement this within the package.

davidbp avatar Mar 04 '23 22:03 davidbp