php-kmeans icon indicating copy to clipboard operation
php-kmeans copied to clipboard

Get total variation (Elbow method)

Open bdelespierre opened this issue 2 years ago • 4 comments

In order to find the best value for K (the number of clusters), it would be nice to get the variance of the distance of clustered points to their cluster's centroid.

Inspired by https://www.youtube.com/watch?v=4b5d3muPQmA Also see https://en.wikipedia.org/wiki/Elbow_method_(clustering)

I also believe the current v3 implementation of RandomInitialization is wrong :man_shrugging:

Proposed change

$result = (new Kmeans\Algorithm($init))->clusterize($points, $K);
echo $result->getTotalVariance();

bdelespierre avatar Sep 11 '21 08:09 bdelespierre

See also https://stackoverflow.com/questions/6645895/calculating-the-percentage-of-variance-measure-for-k-means

bdelespierre avatar Sep 13 '21 09:09 bdelespierre

Implementing the elbow method is quite expensive to implement. getTotalVariance() is correct for implementing the elbow method. But implementing the elbow method requires more implementations. As you can see, kmeans has different results depending on the initial centroid position. This means that the elbow position can be different for each run. We also need a policy for averaging that elbow.

battlecook avatar Sep 14 '21 15:09 battlecook

From this ticket's scope, calculating the Elbow point is someone else's problem. We're just providing the variance here :wink:

bdelespierre avatar Sep 15 '21 12:09 bdelespierre

Oh, that's right. Then I understood. great.

battlecook avatar Sep 15 '21 14:09 battlecook