umap Using the transform function and binary metrics for new data doesn't seem to work properly

First of all, congratulations for the beautiful work!

In my project I am using UMAP with metrics for binary data. It performs very well, as can be seen in the figure below.

However, when I use it for new data, the results are mixed (as can be seen in the figure below). This result only happens when I use the metrics for binary data. At first I thought that there could have been a change in the data characteristics. However, when I use other metrics the data seems more coherent (despite the deterioration in performance). Do you have any idea what might be going on? The green dots are the new data.

Jan 10 '22 15:01 dcruzcavalieri

I forgot to put the commands I'm using:

reducer = umap.UMAP(n_neighbors=3, n_components=3, random_state=23, metric='dice', transform_seed=23, min_dist=0.7, n_epochs=1000) reducer.fit(X1) umap_data = reducer.transform(X1)

New data: X_test_fs = reducer.transform(X_test)

Jan 10 '22 16:01 dcruzcavalieri

Nothing super obvious springs to mind. The binary metrics like dice are always going to be a little trickier, but conceptually it should all work roughly as expected. I'll try to look into this a little more when I get some time.

Jan 11 '22 15:01 lmcinnes

Same issues happen to me, whatever metric i set. Its clear in train data, but messed up in new data.

Jul 04 '22 13:07 shreethamarai