rust-kmedoids
rust-kmedoids copied to clipboard
Add CLARA, FastCLARA, FasterCLARA
CLARA roughly does:
- subsample the data
- run PAM (FastCLARA: FastPAM, FasterCLARA: FasterPAM) on the sample
- compute the total deviation on the entire data set for these medoids
- return the best result found with multiple subsamples
This may seem like a trivial addition at first (and it would indeed only be a few lines in the Python wrapper) BUT:
- this package currently does not include any distance functions, but operates on precomputed distance matrixes only
- if you already have the distance matrix, just use FasterPAM and you will be fine
- a meaningful implementation of these only computes the distance matrix on the subsample - which needs a data matrix as input and distance functions
- for many users it will still be more convenient to handle the subset/sample within their own application
Hence a rough implementation plan would be
- design an API for computing distances compatible with typical users (python wrapper, rust native users)
- implement a decent choice of distance functions
- implement CLARA
- tests
- update the Python wrapper
Adding distance function will also be necessary for CLARANS #6 BanditPAM #2 or coreset approaches #4