NearestNeighbors.jl error with points as vector of points?

Great package! So fast! 😃 Anyway, I thought this MWE (based on the readme example except for using points instead of point) would work (since "points can also be a vector of other vectors where each element in the outer vector is considered a point.") but it throws an error:

julia> using NearestNeighbors

julia> data = rand(3, 10^4);

julia> k = 3 ;

julia> kdtree = KDTree(data) ;

julia> points = [rand(3) for i in 1:10]
10-element Array{Array{Float64,1},1}:
 [0.68304, 0.983004, 0.220862]
 [0.626807, 0.61079, 0.968843]
 [0.192098, 0.157959, 0.300556]
 [0.300309, 0.502549, 0.653381]
 [0.160819, 0.653002, 0.9979]
 [0.809764, 0.0521815, 0.0290807]
 [0.20466, 0.592894, 0.656526]
 [0.370667, 0.0579244, 0.930079]
 [0.593999, 0.934687, 0.555133]
 [0.60504, 0.78062, 0.243998]

julia> idxs, dists = knn(kdtree, points, k, true)
ERROR: MethodError: no method matching length(::Type{Array{Float64,1}})
Closest candidates are:
  length(::Core.SimpleVector) at essentials.jl:561
  length(::Base.MethodList) at reflection.jl:801
  length(::Core.MethodTable) at reflection.jl:875
  ...
Stacktrace:
 [1] check_input(::KDTree{StaticArrays.SArray{Tuple{3},Float64,1,3},Euclidean,Float64}, ::Array{Array{Float64,1},1}) at /Users/benoitpasquier/.julia/packages/NearestNeighbors/N7lgR/src/NearestNeighbors.jl:29
 [2] knn(::KDTree{StaticArrays.SArray{Tuple{3},Float64,1,3},Euclidean,Float64}, ::Array{Array{Float64,1},1}, ::Int64, ::Bool, ::Function) at /Users/benoitpasquier/.julia/packages/NearestNeighbors/N7lgR/src/knn.jl:16 (repeats 2 times)
 [3] top-level scope at none:0

Maybe I'm doing this wrong?

Jun 13 '19 03:06 briochemc

You could use StaticArrays

using StaticArrays, NearestNeighbors
data = rand(3, 10^4);
k = 3;
kdtree = KDTree(data);
points = [@SVector rand(3) for i in 1:10]
idxs, dists = knn(kdtree, points, k, true)

I think this should be reflected in the README, or otherwise there should be a silent attempt internally to turn a vector of vectors into a vector of static vectors. The overall reason for this requirement is that in a vector of vectors the elements may have different lengths, in contrast to points being a matrix, where each column must have the same length. Differing lengths, however, cannot be inferred from the AbstractVector type, and checking it would probably add to the runtime, I'm not sure, however, by how much.

Jun 13 '19 07:06 dkarrasch

Oh I see, that makes sense. (However, I have been using a 3xn array instead for large n.)

I have two questions then:

Is it more performant to use (and generate) a 2D Array or a Vector{SVector}?
What about the data used to generated the tree? I.e., would data = [@SVector rand(3) for i in 1:10^4] be better than data = rand(3, 10^4)?

Jun 13 '19 08:06 briochemc

@KristofferC may help out if I'm saying something wrong, as I haven't written or modified any parts of the code, just scanned quickly through it some time ago. If I am not mistaken, it is always better to have things in the vector of SVector format, because internally, that's what 2D arrays are converted to anyway.

https://github.com/KristofferC/NearestNeighbors.jl/blob/faf05fb2652780d27ce50a5504dc26f0bfcd3311/src/kd_tree.jl#L69-L85

The reason likely is that once you have things in SVector format, the length of vectors and hence the dimension of the space is known to the compiler, which may help performance. But you could check and benchmark, and report here. I'd be interested as well.

Jun 13 '19 08:06 dkarrasch

The key point in the docs is:

or it can be a Vector{V} where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V) is defined

So you should be able to use MVector as well which is mutable. Sure, we could convert an Array of Vectors and assert that all vectors have the same length.

Jun 14 '19 10:06 KristofferC

Oh I see now. I got confused by the fact that eltype(V) and length(V) are not the same as eltype(v::V) and length(v::V)... May I suggest adding "(e.g., V = SVector{...})" and an example to the docs with SVectors?

Jun 14 '19 23:06 briochemc

or it can be a Vector{V} where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V) is defined

But that should be true for a vector of vectors, right? Yet I get

julia> a = [rand(1:10, 2) for i in 1:5]
5-element Array{Array{Int64,1},1}:
 [1, 9]
 [6, 3]
 [2, 7]
 [7, 3]
 [9, 7]

julia> KDTree(a)
ERROR: MethodError: no method matching length(::Type{Array{Int64,1}})
Closest candidates are:
  length(::Core.SimpleVector) at essentials.jl:593
  length(::Base.MethodList) at reflection.jl:832
  length(::Core.MethodTable) at reflection.jl:906
  ...
Stacktrace:
 [1] NearestNeighbors.TreeData(::Array{Array{Int64,1},1}, ::Int64) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/tree_data.jl:14
 [2] #KDTree#16(::Int64, ::Bool, ::Bool, ::Array{Array{Int64,1},1}, ::Type{KDTree}, ::Array{Array{Int64,1},1}, ::Euclidean) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/kd_tree.jl:35
 [3] KDTree(::Array{Array{Int64,1},1}, ::Euclidean) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/kd_tree.jl:33 (repeats 2 times)
 [4] top-level scope at none:0

Oct 23 '19 20:10 mkborregaard

But that should be true for a vector of vectors, right?

No? The error message told you that it wasn't.

Oct 23 '19 21:10 KristofferC

Well. In that case I don't understand what's meant by that part of the docs.

Oct 23 '19 21:10 mkborregaard

Anyway this would be really useful as very few procedures I know of to generate collection of n-dimensional points return them as an n-by-x array.

Oct 23 '19 21:10 mkborregaard

Vector{V} where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V)

For a Vector{Vector{Float64}}, V is Vector{Float64} and length(V) (which is length(Vector{Float64})) is not defined and is what the error message complains about. If instead you had a Vector{SVector{3, Float64}} then V is SVector{3, Float64} and length(V) is 3.

Anyway this would be really useful as very few procedures I know of to generate collection of n-dimensional points return them as an n-by-x array.

A Vector{Vector{Float64}} is an inefficent way of storing a point cloud and users calling NearestNeighbors with such a data structure would have bad performance and blame the library. It is too general of a data structure since the points in such a data structure can have different dimensions (we now need to add extra code to do boundschecks) but the library requires that all points have the same dimension.

Oct 24 '19 07:10 KristofferC

Yeah OK I see, the length needs to be part of the type.

Oct 24 '19 20:10 mkborregaard

NearestNeighbors.jl NearestNeighbors.jl copied to clipboard

error with points as vector of points?

NearestNeighbors.jl
NearestNeighbors.jl copied to clipboard