ParallelKMeans.jl
ParallelKMeans.jl copied to clipboard
Parallel & lightning fast implementation of available classic and contemporary variants of the KMeans clustering algorithm
As far as I can tell, only `kmeans++` is currently implemented. Looking at https://www.mdpi.com/1999-4893/14/1/6 "Improving Scalable K-Means++" it looks like `SRPK-means‖` could be a good method to have available :slightly_smiling_face:.
I have a working implementation of the lightweight coresets paper (https://las.inf.ethz.ch/files/bachem18scalable.pdf) in Julia. It's not distributed yet (I only have one machine to run it on anyway) but if you...
Currently, `YinYang` can work only with euclidean metric, since it's main niternal functions rely heavily on exact form of metric calculation. Algorithm should be generalized (everywhere, where you see `sqrt`...
We have lots of manual unpack, it would be nice to switch to nice Unpack.jl library, of course after thorough benchmarking.
Currently we are implementing only `SqEucledian` metric, but we can add support for all other metrics in `Distances` in the same manner as it is done in https://github.com/JuliaStats/Distances.jl/blob/master/src/generic.jl#L45 We should...
It would be great to provide an interface for researchers to cite this project. [Zenodo](https://zenodo.org/) seems like a good choice but other alternatives should be explored as well.
Some refactoring is needed and we are more or less ready for this changes. I put them all here together, but they can be split later into separate issues. -...
As a future step after the implementation of point-wise parallel computations, it would make sense to improve algorithm by using "Fast kmeans" techniques. Several approaches exists, here is some inspirational...