seriation icon indicating copy to clipboard operation
seriation copied to clipboard

Infrastructure for Ordering using Seriation - R Package

R package seriation - Infrastructure for Ordering Objects Using Seriation

CRAN
version stream r-universe
status CRAN RStudio mirror
downloads

This package provides the infrastructure for ordering objects with an implementation of several seriation/sequencing/ordination techniques to reorder matrices, dissimilarity matrices, and dendrograms (see below for a full list). Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).

Available Seriation Method

The following methods are available for dissimilarity data:

  • ARSA - Simulated annealing (linear seriation)
  • Branch-and-bound to minimize the unweighted/weighted column gradient
  • DendSer - Dendrogram seriation heuristic to optimize various criteria
  • GA - Genetic algorithm with warm start to optimize various criteria
  • GW - Hierarchical clustering reordered by Gruvaeus and Wainer heuristic
  • HC - Hierarchical clustering (single link, avg. link, complete link)
  • Identity permutation
  • MDS - Multidimensional scaling (metric, non-metric, angle)
  • OLO - Hierarchical clustering with optimal leaf ordering
  • OPTICS - Ordering points to identify the clustering structure.
  • QAP - Quadratic assignment problem heuristic (2-SUM, linear seriation, inertia, banded anti-Robinson form)
  • R2E - Rank-two ellipse seriation
  • Random permutation
  • Spectral seriation (unnormalized, normalized)
  • SPIN - Sorting points into neighborhoods (neighborhood algorithm, side-to-site algorithm)
  • TSP - Traveling sales person solver to minimize the Hamiltonian path length
  • TSNE - Order of the 1D t-distributed stochastic neighbor embedding (t-SNE)
  • UMAP - Order of the 1D embedding produced by uniform manifold approximation and projection
  • VAT - Order of the visual assessment of clustering tendency ordering

A detailed comparison of the methods is available in the paper An experimental comparison of seriation methods for one-mode two-way data. (read the preprint).

The following methods are available for matrices:

  • BEA - Bond Energy Algorithm to maximize the measure of effectiveness (ME)
  • Identity permutation
  • PCA - First principal component or angle on the projection on the first two principal components
  • Random permutation
  • TSP - Traveling sales person solver to maximize ME

Installation

Stable CRAN version: Install from within R with

install.packages("seriation")

Current development version: Install from r-universe.

install.packages("seriation", repos = "https://mhahsler.r-universe.dev")

Usage

Load library, read data and calculate distances. Then use default seriation.

library(seriation)
data("SupremeCourt")

# joint probability of disagreement
SupremeCourt
##           Breyer Ginsburg Kennedy OConnor Rehnquist Scalia Souter Stevens
## Breyer      0.00    0.120    0.25    0.21      0.30  0.353  0.118    0.16
## Ginsburg    0.12    0.000    0.27    0.25      0.31  0.370  0.096    0.15
## Kennedy     0.25    0.267    0.00    0.16      0.12  0.188  0.248    0.33
## OConnor     0.21    0.252    0.16    0.00      0.16  0.207  0.220    0.33
## Rehnquist   0.30    0.308    0.12    0.16      0.00  0.143  0.293    0.40
## Scalia      0.35    0.370    0.19    0.21      0.14  0.000  0.338    0.44
## Souter      0.12    0.096    0.25    0.22      0.29  0.338  0.000    0.17
## Stevens     0.16    0.145    0.33    0.33      0.40  0.438  0.169    0.00
## Thomas      0.36    0.368    0.18    0.21      0.14  0.066  0.331    0.44
##           Thomas
## Breyer     0.359
## Ginsburg   0.368
## Kennedy    0.177
## OConnor    0.205
## Rehnquist  0.137
## Scalia     0.066
## Souter     0.331
## Stevens    0.436
## Thomas     0.000
d <- as.dist(SupremeCourt)
order <- seriate(d)
order
## object of class 'ser_permutation', 'list'
## contains permutation vectors for 1-mode data
## 
##   vector length seriation method
## 1             9         Spectral
pimage(d, diag = TRUE, upper = TRUE, main = "judges in original alphabetical order")
pimage(d, order, diag = TRUE, upper = TRUE, main = "judges reordered by seriation")

Compare quality.

rbind(alphabetical = criterion(d), seriated = criterion(d, order))
##              2SUM AR_deviations AR_events BAR Cor_R Gradient_raw
## alphabetical  872        10.304        80 1.8 0.024            8
## seriated      811         0.064         5 1.1 0.085          158
##              Gradient_weighted Inertia Lazy_path_length Least_squares LS  ME
## alphabetical              0.54     267              6.9           967 99  99
## seriated                 19.76     364              4.6           942 86 101
##              Moore_stress Neumann_stress Path_length RGAR
## alphabetical          7.0            3.9         1.8 0.48
## seriated              2.5            1.3         1.1 0.03

References