ClusterAnalysis.jl icon indicating copy to clipboard operation
ClusterAnalysis.jl copied to clipboard

Cluster Algorithms from Scratch with Julia Lang. (K-Means and DBSCAN)

ClusterAnalysis.jl

Stable Dev Build Status Coverage DOI

This package was built from scratch, entirely in Julia Lang, and implements a few popular clustering algorithms like K-Means and DBSCAN.

This is mostly a learning experiment, but the package were also built and documented to be used by anyone, Plug-and-Play. Just input your data as an Array or a Tables.jl type (like DataFrames.jl), then start training your clusters algorithms and analyze your results.

Documentation: https://augustocl.github.io/ClusterAnalysis.jl/

Algorithms Implemented

Currently we implemented two types of algorithms, a partitioned based (K-Means) and a spatial density based (DBSCAN).

Go check the Algorithms Overview Section that contains all the details of how it works the algorithm and also got the bibliography and papers used during the research and development of the code.

It's a great introduction to the algorithm and a good resource to read along with the source code.

How to install ClusterAnalysis.jl

# press ] to enter in Pkg REPL mode.
julia> ]
pkg> add ClusterAnalysis

To-Do

  • [X] Add K-Means++ initialization, to go beyond the random initialization proposed by Andrew NG. DONE
  • [X] Create DBSCAN algorithm. DONE
  • [ ] Create Hierarchical clustering algorithms with single, complete and average linkage options.