uwot
uwot copied to clipboard
Reproducibility issue with cosine metric
This is a followup to issue #46.
The reproducibility issues described there have been fixed for me in 0.1.8 by using approx_pow = TRUE
with an euclidean
or manhattan
metric, but I still face problems when using cosine
.
Here's a result on my laptop (Ubuntu 18.04, R 3.6.3, uwot 0.1.8) :
> set.seed(13); head(uwot::umap(iris, metric = "cosine", init="spca", a=1, b=1, approx_pow=TRUE), 5)
[,1] [,2]
[1,] 2.190465 -14.45460
[2,] 2.153269 -11.64510
[3,] 2.337686 -14.14382
[4,] 1.191009 -12.59075
[5,] 1.472325 -15.06042
And here's the same thing on a server (CentOS 7, R 3.6.1, uwot 0.1.8) :
> set.seed(13); head(uwot::umap(iris, metric = "cosine", init="spca", a=1, b=1, approx_pow=TRUE), 5)
[,1] [,2]
[1,] -15.45597 -4.156313
[2,] -17.59474 -4.357967
[3,] -15.25843 -4.456960
[4,] -17.01195 -2.813276
[5,] -14.92331 -3.548293
The results are the same when run with metric = "euclidean"
.
But in newest version, euclidean + approx_pow still face same problem.
Unfortunately, I cannot give you a satisfactory solution to these issues. As far as I can tell, we are at the mercy of whatever system libraries are part of the base OS.
One thing that could cause issues is the spca
initialization: here you are the mercy of the SVD routine which also can produce arbitrary signs, e.g. from the man page for prcomp
:
The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R.
But I assume the spectral initialization will have the same issue. I don't recommend using init="random"
but if it gives consistent results across architectures then at least you know the initialization is the issue.
Thanks. I will continue to use only one OS for preventing this problem. Now, I could reproduce my result in Ubuntu 22 in different machine.