Agglomerative-Hierarchical-Clustering-from-scratch
Agglomerative-Hierarchical-Clustering-from-scratch copied to clipboard
Build Agglomerative hierarchical clustering algorithm from scratch, i.e. WITHOUT any advance libraries such as Numpy, Pandas, Scikit-learn, etc.
AgglomerativeHierarchicalClusterFromScratching
Agglomerative hierarchical clustering algorithm from scratch (i.e. without advance libraries such as Numpy, Pandas, Scikit-learn, etc.)
Algorithm
During the clustering process, we iteratively aggregate the most similar two clusters, until there are $K$ clusters left. For initialization, each data point forms its own cluster.
Cluster similarity measures
The similarity of two clusters $C_i, C_j$ is determined by a distance measure.
Single link
Complete_link
Average link
The smaller the distance is, the more similar the two clusters are.
In the equations d(), is a distance measure between two data points, i.e. Euclidean distance, defined by:
where
p_i, q_i are dimensions of p, q
Sample usage
python main.py -d sample_input.txt -k 4 -m 0