spherecluster
spherecluster copied to clipboard
Initialization is using euclidean distance
https://github.com/jasonlaska/spherecluster/blob/701b0b1909088a56e353b363b2672580d4fe9d93/spherecluster/spherical_kmeans.py#L44-L48
I might be getting this wrong, but the code here seems to be using initialization function from sklearn. This could cause issue since the kmeans++ initialization in sklearn is based on euclidean distance. It should be replaced with cosine distance.
Thank you, I'll look into it.
May I ask what the status of this issue is? I looked into the commits log but to no avail.
I am currently using init="random"
as a safer option, and would like to switch to k-means++
if this issue is resolved.
Many thanks!
Hi @rtrad89 or @t103z I'm not sure how much this will actually matter since this is just used for intiailization (and sklearn has chosen not to provide a distance-configurable version of kmeans++).
The code for the initialization is here: https://github.com/scikit-learn/scikit-learn/blob/0.22.X/sklearn/cluster/_k_means.py#L41, do you want to take a stab at a PR to port it for cosine dist?