uwot icon indicating copy to clipboard operation
uwot copied to clipboard

Reproducibility issue with cosine metric

Open juba opened this issue 4 years ago • 4 comments

This is a followup to issue #46.

The reproducibility issues described there have been fixed for me in 0.1.8 by using approx_pow = TRUE with an euclidean or manhattan metric, but I still face problems when using cosine.

Here's a result on my laptop (Ubuntu 18.04, R 3.6.3, uwot 0.1.8) :

> set.seed(13); head(uwot::umap(iris, metric = "cosine", init="spca", a=1, b=1, approx_pow=TRUE), 5)
         [,1]      [,2]
[1,] 2.190465 -14.45460
[2,] 2.153269 -11.64510
[3,] 2.337686 -14.14382
[4,] 1.191009 -12.59075
[5,] 1.472325 -15.06042

And here's the same thing on a server (CentOS 7, R 3.6.1, uwot 0.1.8) :

> set.seed(13); head(uwot::umap(iris, metric = "cosine", init="spca", a=1, b=1, approx_pow=TRUE), 5)                                                                
          [,1]      [,2]                                                                                                                                            
[1,] -15.45597 -4.156313
[2,] -17.59474 -4.357967
[3,] -15.25843 -4.456960
[4,] -17.01195 -2.813276
[5,] -14.92331 -3.548293

The results are the same when run with metric = "euclidean".

juba avatar Mar 23 '20 18:03 juba

But in newest version, euclidean + approx_pow still face same problem.

tlz4320 avatar Nov 27 '22 01:11 tlz4320

Unfortunately, I cannot give you a satisfactory solution to these issues. As far as I can tell, we are at the mercy of whatever system libraries are part of the base OS.

jlmelville avatar Nov 27 '22 02:11 jlmelville

One thing that could cause issues is the spca initialization: here you are the mercy of the SVD routine which also can produce arbitrary signs, e.g. from the man page for prcomp:

The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R.

But I assume the spectral initialization will have the same issue. I don't recommend using init="random" but if it gives consistent results across architectures then at least you know the initialization is the issue.

jlmelville avatar Nov 27 '22 02:11 jlmelville

Thanks. I will continue to use only one OS for preventing this problem. Now, I could reproduce my result in Ubuntu 22 in different machine.

tlz4320 avatar Nov 28 '22 05:11 tlz4320