clusteringdatasets
clusteringdatasets copied to clipboard
An R package providing datasets useful for testing clustering algorithms
Clustering Datasets
An R-repackaging of datasets useful for evaluating clustering methods. The source for most is http://cs.joensuu.fi/sipu/datasets
I would love to include additional clustering datasets, if folks would like to provide them or make a PR.
Clustering Datasets
This vignette provides a simple overview of the datasets included in the package.
Birch

S Sets
The S-sets are useful for testing how an algorithm handles cluster overlap.

A Sets

Shapesets

Chameleon

Neural Gas

Non-Convex

Locations

High Dimensional Datasets
The package contains three sets of high-dimensional data. The
visualizations below were made using my largeVis package to reduce
each dataset to two dimensions, and the colors are the result of
applying the hdbscan function within the package.
UCI Datasets

KDDCUP04Bio

Sklearn Toy Datasets
The Python sklearn.datasets package includes functions for creating
toy datasets. I’ve ported a few of them.
Make Blobs
library(clusteringdatasets)
blobs <- make_blobs()
plot(blobs$samples, col=rainbow(3)[blobs$labels])

Make Moons
moons <- make_moons(noise=0.04)
plot(moons$samples, col=rainbow(2)[moons$labels])
