NearestNeighbors.jl
NearestNeighbors.jl copied to clipboard
error with points as vector of points?
Great package! So fast! 😃 Anyway, I thought this MWE (based on the readme example except for using points
instead of point
) would work (since "points
can also be a vector of other vectors where each element in the outer vector is considered a point.") but it throws an error:
julia> using NearestNeighbors
julia> data = rand(3, 10^4);
julia> k = 3 ;
julia> kdtree = KDTree(data) ;
julia> points = [rand(3) for i in 1:10]
10-element Array{Array{Float64,1},1}:
[0.68304, 0.983004, 0.220862]
[0.626807, 0.61079, 0.968843]
[0.192098, 0.157959, 0.300556]
[0.300309, 0.502549, 0.653381]
[0.160819, 0.653002, 0.9979]
[0.809764, 0.0521815, 0.0290807]
[0.20466, 0.592894, 0.656526]
[0.370667, 0.0579244, 0.930079]
[0.593999, 0.934687, 0.555133]
[0.60504, 0.78062, 0.243998]
julia> idxs, dists = knn(kdtree, points, k, true)
ERROR: MethodError: no method matching length(::Type{Array{Float64,1}})
Closest candidates are:
length(::Core.SimpleVector) at essentials.jl:561
length(::Base.MethodList) at reflection.jl:801
length(::Core.MethodTable) at reflection.jl:875
...
Stacktrace:
[1] check_input(::KDTree{StaticArrays.SArray{Tuple{3},Float64,1,3},Euclidean,Float64}, ::Array{Array{Float64,1},1}) at /Users/benoitpasquier/.julia/packages/NearestNeighbors/N7lgR/src/NearestNeighbors.jl:29
[2] knn(::KDTree{StaticArrays.SArray{Tuple{3},Float64,1,3},Euclidean,Float64}, ::Array{Array{Float64,1},1}, ::Int64, ::Bool, ::Function) at /Users/benoitpasquier/.julia/packages/NearestNeighbors/N7lgR/src/knn.jl:16 (repeats 2 times)
[3] top-level scope at none:0
Maybe I'm doing this wrong?
You could use StaticArrays
using StaticArrays, NearestNeighbors
data = rand(3, 10^4);
k = 3;
kdtree = KDTree(data);
points = [@SVector rand(3) for i in 1:10]
idxs, dists = knn(kdtree, points, k, true)
I think this should be reflected in the README, or otherwise there should be a silent attempt internally to turn a vector of vectors into a vector of static vectors. The overall reason for this requirement is that in a vector of vectors the elements may have different lengths, in contrast to points
being a matrix, where each column must have the same length. Differing lengths, however, cannot be inferred from the AbstractVector type, and checking it would probably add to the runtime, I'm not sure, however, by how much.
Oh I see, that makes sense. (However, I have been using a 3xn array instead for large n.)
I have two questions then:
- Is it more performant to use (and generate) a 2D
Array
or aVector{SVector}
? - What about the data used to generated the tree? I.e., would
data = [@SVector rand(3) for i in 1:10^4]
be better thandata = rand(3, 10^4)
?
@KristofferC may help out if I'm saying something wrong, as I haven't written or modified any parts of the code, just scanned quickly through it some time ago. If I am not mistaken, it is always better to have things in the vector of SVector
format, because internally, that's what 2D arrays are converted to anyway.
https://github.com/KristofferC/NearestNeighbors.jl/blob/faf05fb2652780d27ce50a5504dc26f0bfcd3311/src/kd_tree.jl#L69-L85
The reason likely is that once you have things in SVector
format, the length of vectors and hence the dimension of the space is known to the compiler, which may help performance. But you could check and benchmark, and report here. I'd be interested as well.
The key point in the docs is:
or it can be a
Vector{V}
whereV
is itself a subtype of anAbstractVector
and such thateltype(V)
andlength(V)
is defined
So you should be able to use MVector
as well which is mutable. Sure, we could convert an Array of Vectors and assert that all vectors have the same length.
Oh I see now. I got confused by the fact that eltype(V)
and length(V)
are not the same as eltype(v::V)
and length(v::V)
... May I suggest adding "(e.g., V = SVector{...}
)" and an example to the docs with SVectors
?
or it can be a Vector{V} where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V) is defined
But that should be true for a vector of vectors, right? Yet I get
julia> a = [rand(1:10, 2) for i in 1:5]
5-element Array{Array{Int64,1},1}:
[1, 9]
[6, 3]
[2, 7]
[7, 3]
[9, 7]
julia> KDTree(a)
ERROR: MethodError: no method matching length(::Type{Array{Int64,1}})
Closest candidates are:
length(::Core.SimpleVector) at essentials.jl:593
length(::Base.MethodList) at reflection.jl:832
length(::Core.MethodTable) at reflection.jl:906
...
Stacktrace:
[1] NearestNeighbors.TreeData(::Array{Array{Int64,1},1}, ::Int64) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/tree_data.jl:14
[2] #KDTree#16(::Int64, ::Bool, ::Bool, ::Array{Array{Int64,1},1}, ::Type{KDTree}, ::Array{Array{Int64,1},1}, ::Euclidean) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/kd_tree.jl:35
[3] KDTree(::Array{Array{Int64,1},1}, ::Euclidean) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/kd_tree.jl:33 (repeats 2 times)
[4] top-level scope at none:0
But that should be true for a vector of vectors, right?
No? The error message told you that it wasn't.
Well. In that case I don't understand what's meant by that part of the docs.
Anyway this would be really useful as very few procedures I know of to generate collection of n-dimensional points return them as an n-by-x array.
Vector{V} where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V)
For a Vector{Vector{Float64}}
, V
is Vector{Float64}
and length(V)
(which is length(Vector{Float64})
) is not defined and is what the error message complains about. If instead you had a Vector{SVector{3, Float64}}
then V
is SVector{3, Float64}
and length(V)
is 3.
Anyway this would be really useful as very few procedures I know of to generate collection of n-dimensional points return them as an n-by-x array.
A Vector{Vector{Float64}}
is an inefficent way of storing a point cloud and users calling NearestNeighbors with such a data structure would have bad performance and blame the library. It is too general of a data structure since the points in such a data structure can have different dimensions (we now need to add extra code to do boundschecks) but the library requires that all points have the same dimension.
Yeah OK I see, the length needs to be part of the type.