uwot
uwot copied to clipboard
Metric = "precomputed" is not implimented
Metric = "precomputed" is not implemented
I would like to run uwot::umap() with metric = 'pearson'. However, 'pearson' is not an option with within this package and I got the following error:
Error in match.arg(metric, c("euclidean", "cosine", "manhattan", "hamming", : 'arg' should be one of “euclidean”, “cosine”, “manhattan”, “hamming”, “precomputed”
This error suggests that I can use a "precomputed" distance matrix. So I tried to run uwot::umap() with metric = 'precomputed' and got the following error:
Error in create_ann(metric, nc) : BUG: unknown Annoy metric 'precomputed'
This error suggests precomputed is not implemented within this package.
PS. The original umap package allows for metrix = 'pearson.' It would be nice to see this added to this package!
metric = "precomputed"
is for use with nearest neighbor data, so it requires a list of two matrices, the nearest neighbor indices and the distances. There are some details at https://github.com/jlmelville/uwot#nearest-neighbor-data-format.
It may be the case that uwot can already do what you want. If you have created a full distance matrix yourself, then if you convert it to a dist
object, you can pass it directly to the X
parameter of uwot without specifying metric = "precomputed"
, e.g.:
iris_dist <- dist(iris[, -5])
iris_umap <- umap(iris_dist)
I do see that metric = "precomputed"
causes an error in the above case, so I will fix that. If you can provide an example of the input you were trying to use, I will try to improve the error reporting for this code path.
Thank you for the suggestion about other metrics and the vote for Pearson. I would also like to see more, but uwot relies on the metrics that Annoy supports. It's possible that I will get more of the neighbor search part of PyNNDescent implemented in R and then more metrics will be available.
Thank you for the suggestions. I have successfully run uwot::umap() with Pearson correlation via nn_method = list(idx = index_matrix, dist = dist_matrix) and via uwot::umap(dist(dist_matrix), metric = "precomputed"). My dist_matrix and index_matrix were created with Pearson correlation. Unfortunately, I wanted to do metric learning which isn't possible through this implementation.
Although I suspect that this is way too late for @rach226a purposes, I am temporarily re-opening to note that:
-
#64 now allows for transforming new data with precomputed nearest neighbor data, and metric learning works as part of that:
devtools::install_github("jlmelville/vizier") devtools::install_github("jlmelville/snedata") fashion <- snedata::download_fashion_mnist() fashion_train <- head(fashion, 60000) fashion_test <- tail(fashion, 10000) # calculate the nearest neighbors outside of uwot (pretend the function isn't the implementation in uwot) fashion_train.nn <- uwot:::annoy_nn(X = as.matrix(fashion_train[, 1:784]), k = 15, metric = "cosine", ret_index = TRUE) # return umap map with annoy_nn input set.seed(1337) fashion_umap <- uwot::umap(X = NULL, nn_method = fashion_train.nn, ret_model = TRUE, y = fashion_train$Label) # compute the query-reference annoy_nn query_ref.nn <- uwot:::annoy_search(X = as.matrix(fashion_test[, 1:784]), k = 15, ann = fashion_train.nn$index ) # use the query-reference annoy_nn to transform query to reference fashion_umap_test <- uwot::umap_transform(X = NULL, model = fashion_umap, nn_method = query_ref.nn) vizier::embed_plot(fashion_umap$embedding, fashion_train, cex = 0.5, title = "Fashion UMAP", alpha_scale = 0.075) vizier::embed_plot(fashion_umap_test, fashion_test, cex = 0.5, title = "Fashion Test UMAP", alpha_scale = 0.075)
-
Pearson correlation distance is the same as using cosine distance with each row normalized to zero mean, so it's already available in uwot at the cost of a little work up front:
devtools::install_github("jlmelville/vizier") devtools::install_github("jlmelville/snedata") fashion <- snedata::download_fashion_mnist() fashion_train <- head(fashion, 60000) fashion_test <- tail(fashion, 10000) # subtract mean from each row fashion_trainm <- as.matrix(fashion_train[, 1:784]) fashion_trainm <- fashion_trainm - apply(fashion_trainm, 1, mean) fashion_testm <- as.matrix(fashion_test[, 1:784]) fashion_testm <- fashion_testm - apply(fashion_testm, 1, mean) fashion_umap <- uwot::umap(fashion_trainm, metric = "cosine", ret_model = TRUE, y = fashion_train$Label, verbose = TRUE) fashion_umap_test <- uwot::umap_transform(fashion_testm, model = fashion_umap) vizier::embed_plot(fashion_umap$embedding, fashion_train, cex = 0.5, title = "Fashion UMAP (Correlation)", alpha_scale = 0.075) vizier::embed_plot(fashion_umap_test, fashion_test, cex = 0.5, title = "Fashion Test UMAP (Correlation)", alpha_scale = 0.075)
But it would be better for uwot
to do this work internally, and add a metric = "correlation"
option.
I have a distance matrix calculated with a non-supported metric (earth mover distance). How can I get it into the required format? I tried str(fashion_train.nn) from your first example in an attempt to reverse-engineer the format, but it is complex enough so that it's not obvious what is required to move from a square symmetric matrix to that format. Thanks in advance for any help.
To carry out UMAP successfully your NN data should be be in the form of a list consisting of two N
x k
matrices, where N
is the number of points in the data set and k
is the number of nearest neighbors. Matrix idx
contains the indices of the neighbors of point i
in row i
. Matrix dist
contains the equivalent distances.
If you have full dense N
x N
distance matrix, then there is an internal function you can use, uwot:::dist_nn
, that will carry out the conversion for you, e.g.:
iris10 <- as.matrix(iris[1:10, -5])
iris10_dm <- as.matrix(dist(iris10))
# get 4 nearest neighbors
iris10_nn <- uwot:::dist_nn(iris10_dm, k = 4)
Thanks!
David Katz, TIBCO Data Science
Using precomputed nearest neighbors is covered at https://jlmelville.github.io/uwot/articles/hnsw-umap.html and https://jlmelville.github.io/uwot/articles/rnndescent-umap.html. Pearson correlation is now supported with metric = "correlation"
.