umap icon indicating copy to clipboard operation
umap copied to clipboard

Using the transform function and binary metrics for new data doesn't seem to work properly

Open dcruzcavalieri opened this issue 4 years ago • 3 comments

First of all, congratulations for the beautiful work!

In my project I am using UMAP with metrics for binary data. It performs very well, as can be seen in the figure below.

image

However, when I use it for new data, the results are mixed (as can be seen in the figure below). This result only happens when I use the metrics for binary data. At first I thought that there could have been a change in the data characteristics. However, when I use other metrics the data seems more coherent (despite the deterioration in performance). Do you have any idea what might be going on? The green dots are the new data.

image

dcruzcavalieri avatar Jan 10 '22 15:01 dcruzcavalieri

I forgot to put the commands I'm using:

reducer = umap.UMAP(n_neighbors=3, n_components=3, random_state=23, metric='dice', transform_seed=23, min_dist=0.7, n_epochs=1000) reducer.fit(X1) umap_data = reducer.transform(X1)

New data: X_test_fs = reducer.transform(X_test)

dcruzcavalieri avatar Jan 10 '22 16:01 dcruzcavalieri

Nothing super obvious springs to mind. The binary metrics like dice are always going to be a little trickier, but conceptually it should all work roughly as expected. I'll try to look into this a little more when I get some time.

lmcinnes avatar Jan 11 '22 15:01 lmcinnes

Same issues happen to me, whatever metric i set. Its clear in train data, but messed up in new data.

shreethamarai avatar Jul 04 '22 13:07 shreethamarai