tensorboard icon indicating copy to clipboard operation
tensorboard copied to clipboard

Embedding Projector: UMAP and TSNE projections broken for embeddings that are not normalized

Open alicialics opened this issue 2 years ago • 5 comments

Chrome 111.0.5563.64

Issue description

When using embeddings that are not normalized and sphereized, the UMAP and T-SNE are incorrect or not simply loading. See #5547 for a previous bug report.

the reason is that knn expects normalized vectors for cosine distance (cosDistNorm) rather than arbitrary vectors.

  • Build and launch projector (must be from master, not https://projector.tensorflow.org)
  • Select Iris demo tensor dataset
  • Keep "Sphereize data" unchecked
  • change projection type to T-SNE
  • The end result is different from what happens when using https://projector.tensorflow.org/
  • change projection tyoe to UMAP
  • See "Initialize UMAP..." modal loading forever

Alternative repo:

  • Build and launch projector (must be from master, not https://projector.tensorflow.org)
  • Uncheck "Sphereize data" on the default Word2Vec 10k dataset
  • Switch projection from "PCA" to either t-SNE or UMAP
  • See the UI breaks with "Initializing t-SNE..."/"Initialize UMAP..." modal loading forever

related to: #2421

alicialics avatar Mar 25 '23 01:03 alicialics

I'm currently facing this issue. Is there any workaround? Can I pass a normalized embedding to the checkpoint? If so, what would be the correct normalization? I'm trying some approaches, but I could not work with them.

EDIT: This issue is still persistent in v2.13.0. I'm using Windows 11 and Google Chrome.

dmfolgado avatar May 20 '23 00:05 dmfolgado

@dmfolgado you can either use the built-in "Sphereize data" option or normalize the embedding yourself. I think all you need to make sure each one is a unit vector in euclidean space. Sphereizing does this and in addition normalizes the centroid to origin so it might even work better

alicialics avatar May 21 '23 18:05 alicialics

Thank you for the suggestion. I had already tried but I continue to get different projections between the online Projector and the offline.

Local version. Clustering the CBF time series dataset (X_test) offline

Online version. Clustering the CBF time series dataset (X_test) online

I attach the data and metadata for reproducibility. data.zip

dmfolgado avatar May 21 '23 20:05 dmfolgado

I just found what was causing the discrepancy. The data I was using for the online projection was standard scaled. It seems that using standard scale data and the spherization yields to the same results.

dmfolgado avatar May 21 '23 20:05 dmfolgado

Any update on the stuck on "Initialize UMAP..." message issue? I am currently facing the exact same problem which I cannot recreate with the online embedding projector using the same data. Unless "Spherize data" is checked before clicking on UMAP, TensorBoard embedding projector remains stuck on "Initialising UMAP...".

delale avatar Sep 11 '23 08:09 delale