node-kmeans
node-kmeans copied to clipboard
for the same set of data, the centroids vary for new run
For each new run of node-kmeans on the same set of data, the clusters and centroids vary. Is there any way we can fix the skewed results or probably start with a constant seed.
I'm also seeing this problem. Appears to generate new centroids on every run of identical data
Is there a lot of local minima in your data set?
Yes. This is pixel RGB color data from an image
Yes, that's linked to your problem.
Finding the global minimum of the k-means problem is NP-hard in general.
Any easy fix?
This is one todo.
I think that can be solved with different solutions:
- replicates: trying many random starting points and merging
- adding some randomness
I'm happy if you create a Pull Request with a solution.
Thanks, Philmod
Appreciate the time. I'll try to look into it after next week if i have some time!
One of the solution used in sklearn
is to used the inertia:
- Do the
kmean
many times with different initiation - For each result, compute the inertia
- Keep the results with the lowest inertia
Note about inertia (from sklearn): Sum of squared distances of samples to their closest cluster center.
@Philmod I can do a PR