clusteval icon indicating copy to clipboard operation
clusteval copied to clipboard

Clusteval provides methods for unsupervised cluster validation

clusteval

Python PyPI Version License BuyMeCoffee Github Forks GitHub Open Issues Project Status Downloads Downloads DOI Sphinx Open In Colab

clusteval is a python package that is developed to evaluate detected clusters and return the cluster labels that have most optimal clustering tendency, Number of clusters and clustering quality. Multiple evaluation strategies are implemented for the evaluation; silhouette, dbindex, and derivative, and four clustering methods can be used: agglomerative, kmeans, dbscan and hdbscan.

⭐️ Star this repo if you like it ⭐️

Blogs

1. A step-by-step guide for clustering images

2. Detection of Duplicate Images Using Image Hash Functions

3. From Data to Clusters: When is Your Clustering Good Enough?

4. From Clusters To Insights; The Next Step

Documentation pages

On the documentation pages you can find detailed information about the working of the clusteval with many examples.

Installation

It is advisable to create a new environment (e.g. with Conda).
conda create -n env_clusteval python=3.8
conda activate clusteval
Install from PyPI
pip install clusteval
Import library
from clusteval import clusteval

Examples

A structured overview of all examples are now available on the documentation pages.


Citation

Please cite clusteval in your publications if this is useful for your research (see right top for citation).

Other interesting techniques/blogs

  • Use ARI when the ground truth clustering has large equal sized clusters
  • Usa AMI when the ground truth clustering is unbalanced and there exist small clusters
  • https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html
  • https://scikit-learn.org/stable/auto_examples/cluster/plot_adjusted_for_chance_measures.html#sphx-glr-auto-examples-cluster-plot-adjusted-for-chance-measures-py
  • https://github.com/idealo/imagededup
  • https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34
  • https://github.com/facebookresearch/deepcluster
  • https://towardsdatascience.com/pca-on-hyperspectral-data-99c9c5178385
  • https://machinelearningmastery.com/face-recognition-using-principal-component-analysis/

Maintainer

  • Erdogan Taskesen, github: erdogant
  • Contributions are welcome.
  • If you wish to buy me a Coffee for this work, it is very appreciated :) Star it if you like it!