NearestNeighbors.jl
NearestNeighbors.jl copied to clipboard
Trees for non-Metrics?
For NLP it is common to want to use CosineDist
,
which is a SemiMetric
.
This is not going to be compatible with the BallTree, I think.
but it should be fine with the BruteTree
.
This would come in handy for Clustering.jl
, what do you think @KristofferC ? Currently cluster assigment is performed computing and storing all pairwise distances, which is quite bad in terms of memory (and it ends up beeing slower as well), it would be nice to use a BruteTree
to get cluster assigments. Something similar to
using NearestNeighbors
function get_cluster_assignments_nearest_neighbors(
X::Matrix{T},
centers::Matrix{T},
distance::SemiMetric=SqEuclidean(), # in: function to calculate distance with
) where {F<:Function,T}
brutetree = BruteTree(centers, distance)
idx, distances = knn(brutetree, X, 1)
return idx
end
I asked for this in this PR https://github.com/JuliaStats/Clustering.jl/pull/238 but the idea was to leverage something like Distances.jl
or NearestNeighbors
and not implement this within the package.