fast-kmeans Sort-means appears to have a bug

Sort-means is giving slightly different results in some of my testing, compared to all the other algorithms. I have a test case where every other algorithm converges in 45 iterations, but it takes 42, and gives slightly different results (centers).

Steps to reproduce:

run naive k-means and sort-means on the attached input, with k=10, using the library's k-means++ initialization, with a max iterations of 1000
expected result: same number of iterations and centers
actual result: different centers, different iterations

The initial centers chosen by k-means++ on this dataset are:

    -0.379863       1.80735      0.658205      0.236417       1.52649 
    -0.456454      -0.44091      0.932621        1.4306      -0.77737 
      1.10611      -1.30305     -0.202388      -1.29549      0.150969 
     0.951702      0.885737    -0.0140388     -0.295967      -2.10619 
      2.52547      0.803819       1.57157      0.481766      0.168094 
     -0.35617      0.423922       0.72442     -0.557642      0.466705 
     0.619142      -1.08521      -1.85976      -1.19759     -0.763672 
    -0.929235      0.901898     -0.717973      -2.60481     0.0617718 
    -0.391341      0.258408     -0.368092       0.53643      -1.35026 
     0.242555       -1.5091   -0.00501431       0.30317         1.244

bad_input.txt.gz

Jul 10 '18 16:07 ghamerly

I've started looking at this; so far, I've replicated these results, but nothing strikes me as obviously wrong about the implementation. I'll run some more detailed tests to see if I can find anything.

Aug 19 '18 20:08 cur33

I would probably tackle this by printing out detailed information about each iteration (both for sort-means and for the naive algorithm), identifying the earliest iteration where something goes wrong (some assignment is incorrect), and drilling into why that happened.

Aug 19 '18 20:08 ghamerly

Will do!

Aug 19 '18 20:08 cur33

fast-kmeans fast-kmeans copied to clipboard

Sort-means appears to have a bug

fast-kmeans
fast-kmeans copied to clipboard