Clustering.jl icon indicating copy to clipboard operation
Clustering.jl copied to clipboard

sparse matrices fail

Open swadey opened this issue 10 years ago • 5 comments

I get this error when calling kmeans on a sparse matrix:

julia> kmeans(x', 50)                                                                                                                                                         
ERROR: no method kmeans(SparseMatrixCSC{Float32,Int32}, Int64) 

Could this be due to the StoredArray change in julia?

swadey avatar Mar 02 '14 18:03 swadey

BTW, I'm on julia HEAD: JuliaLang/julia@244cffc7d99b74fa2b7aab9efed812aeba7e4b38

swadey avatar Mar 02 '14 18:03 swadey

The algorithm itself is only for dense matrices.

We may add a k-means algorithms for sparse matrices something in future. However, this is not very high in our priority list. A pull request may make this happen faster.

lindahua avatar Mar 02 '14 18:03 lindahua

@lindahua is there an actual dependency on dense vectors or just that it produces dense centroids? I don't know what the implementation is doing, but if it's doing some kind of kd-tree/ball-tree for a nearest neighbor approximation, that would make sense.

swadey avatar Mar 02 '14 19:03 swadey

The algorithm scans each element in a dense pattern when computing the mean & computing distances. The pairwise distance computing function only accepts dense matrices, as it relies on BLAS's gemm to compute distances in a very fast way.

lindahua avatar Mar 02 '14 21:03 lindahua

It does not use kd-tree in any way, it just relies on BLAS to compute pairwise Euclidean distances.

lindahua avatar Mar 02 '14 21:03 lindahua