scikit-learn-extra
scikit-learn-extra copied to clipboard
Add example where KMedoids is better than existing scikit-learn clustering algorithms
From @rth in #12:
A few more comments @zdog234 , otherwise (after a light review) LGTM.
We adopted black for code style recently. Please run
black sklearn_extra/ examples/
for fixing the linter CI.I would rather we merged this and opened follow up issues than keep this PR open until everything is perfect.
Maybe @jeremiedbb who worked on KMeans lately would also have some comments.
Later it would be nice to add an example on some dataset where KMedoids is a better than existing scikit-learn clustering algorithms as discussed in scikit-learn/scikit-learn#11099 (comment)
The current code implements an inferior algorithm, so I'd rather suggest to compare the results of non-Python implementations (R, ELKI, pip install kmedoids
) for now if you want to study result quality.
kmedoid can be better than kmeans for example for robust purposes. For example, see this figure where kmedoid gives a really good result while kmeans detect any outlier as belonging to a class of its own (the data consist in 3 gaussian blobs and an "outlier" group situated far away from these blobs, and I don't know a lot of clustering algorithm that would exhibit this kind of robustness (in fact kmedoid is a little more stable on this example than the algorithm I did specifically to be robust, the second figure). This example could be added to the doc I think.
This example could be added to the doc I think.
That would be great! Do you already have the code for that example @TimotheeMathieu ?
Yes, in fact it is an example I came up for the PR #42, you can find it here, I just added k-medoid with default parameters and I got the result displayed. Maybe it would be interesting to change the doc page I made to include k-medoid because in fact k-medoid is robust. I will try making a PR for this if it's alright for you.
That would be great thank you !
In general if you see other things to improve in this repo don't hesitate to submit PRs, we are actively looking for maintainers :)