rust-kmedoids icon indicating copy to clipboard operation
rust-kmedoids copied to clipboard

Add CLARANS and FastCLARANS

Open kno10 opened this issue 1 year ago • 0 comments

As these methods require distance functions and a data matrix, it makes sense to first implement CLARA #5

CLARANS modifies PAM with a random sampling follows:

  • choose a subset of medoid + non-medoid pairs uniformly
  • compute the best swap in this subset only
  • perform the best swap

hence it approximates PAM.

We can easily (c.f., the FastPAM papers) integrate the FastPAM optimizations:

FastCLARANS:

  • choose a uniform subset of non-medoids
  • compute the loss regarding all medoids in O(N) per non-medoid using the FastPAM optimizations

It makes less sense to also integrate the "Eager" optimizations - eventually we can just run FasterPAM on shuffled input with a limit on the number of distance computations to achieve a similar approximation. Hence I do not believe CLARANS and FastCLARANS are of much practical relevance anymore.

The FastCLARANS reference implementation in Java is in ELKI: FastCLARANS but this contains data structures optimized for Java that we can do differently in Rust without the same overhead.

kno10 avatar Dec 11 '23 08:12 kno10