node-kmeans icon indicating copy to clipboard operation
node-kmeans copied to clipboard

for the same set of data, the centroids vary for new run

Open krishnakumar85 opened this issue 11 years ago • 8 comments

For each new run of node-kmeans on the same set of data, the clusters and centroids vary. Is there any way we can fix the skewed results or probably start with a constant seed.

krishnakumar85 avatar Sep 25 '13 07:09 krishnakumar85

I'm also seeing this problem. Appears to generate new centroids on every run of identical data

listonb avatar Nov 07 '14 16:11 listonb

Is there a lot of local minima in your data set?

Philmod avatar Nov 07 '14 17:11 Philmod

Yes. This is pixel RGB color data from an image

listonb avatar Nov 07 '14 18:11 listonb

Yes, that's linked to your problem.

Finding the global minimum of the k-means problem is NP-hard in general.

Philmod avatar Nov 07 '14 19:11 Philmod

Any easy fix?

listonb avatar Nov 07 '14 19:11 listonb

This is one todo.

I think that can be solved with different solutions:

  • replicates: trying many random starting points and merging
  • adding some randomness

I'm happy if you create a Pull Request with a solution.

Thanks, Philmod

Philmod avatar Nov 07 '14 19:11 Philmod

Appreciate the time. I'll try to look into it after next week if i have some time!

listonb avatar Nov 07 '14 19:11 listonb

One of the solution used in sklearn is to used the inertia:

  1. Do the kmean many times with different initiation
  2. For each result, compute the inertia
  3. Keep the results with the lowest inertia

Note about inertia (from sklearn): Sum of squared distances of samples to their closest cluster center.

@Philmod I can do a PR

Morikko avatar Aug 30 '18 13:08 Morikko