scikit-learn-extra Add example where KMedoids is better than existing scikit-learn clustering algorithms

Add example where KMedoids is better than existing scikit-learn clustering algorithms

Open znd4 opened this issue 5 years ago • 6 comments

From @rth in #12:

A few more comments @zdog234 , otherwise (after a light review) LGTM.

We adopted black for code style recently. Please run black sklearn_extra/ examples/ for fixing the linter CI.

I would rather we merged this and opened follow up issues than keep this PR open until everything is perfect.

Maybe @jeremiedbb who worked on KMeans lately would also have some comments.

Later it would be nice to add an example on some dataset where KMedoids is a better than existing scikit-learn clustering algorithms as discussed in scikit-learn/scikit-learn#11099 (comment)

Jul 28 '19 14:07 znd4

The current code implements an inferior algorithm, so I'd rather suggest to compare the results of non-Python implementations (R, ELKI, pip install kmedoids) for now if you want to study result quality.

Jan 26 '20 15:01 kno10

kmedoid can be better than kmeans for example for robust purposes. For example, see this figure where kmedoid gives a really good result while kmeans detect any outlier as belonging to a class of its own (the data consist in 3 gaussian blobs and an "outlier" group situated far away from these blobs, and I don't know a lot of clustering algorithm that would exhibit this kind of robustness (in fact kmedoid is a little more stable on this example than the algorithm I did specifically to be robust, the second figure). This example could be added to the doc I think.

Jun 26 '20 13:06 TimotheeMathieu

This example could be added to the doc I think.

That would be great! Do you already have the code for that example @TimotheeMathieu ?

Jun 26 '20 19:06 rth

Yes, in fact it is an example I came up for the PR #42, you can find it here, I just added k-medoid with default parameters and I got the result displayed. Maybe it would be interesting to change the doc page I made to include k-medoid because in fact k-medoid is robust. I will try making a PR for this if it's alright for you.

Jun 26 '20 19:06 TimotheeMathieu

That would be great thank you !

Jun 26 '20 19:06 rth

In general if you see other things to improve in this repo don't hesitate to submit PRs, we are actively looking for maintainers :)

Jun 26 '20 19:06 rth

scikit-learn-extra scikit-learn-extra copied to clipboard

Add example where KMedoids is better than existing scikit-learn clustering algorithms

scikit-learn-extra
scikit-learn-extra copied to clipboard