tensorboard
tensorboard copied to clipboard
Embedding Projector: UMAP and TSNE projections broken for embeddings that are not normalized
Chrome 111.0.5563.64
Issue description
When using embeddings that are not normalized and sphereized, the UMAP and T-SNE are incorrect or not simply loading. See #5547 for a previous bug report.
the reason is that knn expects normalized vectors for cosine distance (cosDistNorm) rather than arbitrary vectors.
- Build and launch projector (must be from master, not https://projector.tensorflow.org)
- Select Iris demo tensor dataset
- Keep "Sphereize data" unchecked
- change projection type to T-SNE
- The end result is different from what happens when using https://projector.tensorflow.org/
- change projection tyoe to UMAP
- See "Initialize UMAP..." modal loading forever
Alternative repo:
- Build and launch projector (must be from master, not https://projector.tensorflow.org)
- Uncheck "Sphereize data" on the default Word2Vec 10k dataset
- Switch projection from "PCA" to either t-SNE or UMAP
- See the UI breaks with "Initializing t-SNE..."/"Initialize UMAP..." modal loading forever
related to: #2421
I'm currently facing this issue. Is there any workaround? Can I pass a normalized embedding to the checkpoint? If so, what would be the correct normalization? I'm trying some approaches, but I could not work with them.
EDIT: This issue is still persistent in v2.13.0. I'm using Windows 11 and Google Chrome.
@dmfolgado you can either use the built-in "Sphereize data" option or normalize the embedding yourself. I think all you need to make sure each one is a unit vector in euclidean space. Sphereizing does this and in addition normalizes the centroid to origin so it might even work better
Thank you for the suggestion. I had already tried but I continue to get different projections between the online Projector and the offline.
Local version. Clustering the CBF time series dataset (X_test)
Online version. Clustering the CBF time series dataset (X_test)
I attach the data and metadata for reproducibility. data.zip
I just found what was causing the discrepancy. The data I was using for the online projection was standard scaled. It seems that using standard scale data and the spherization yields to the same results.
Any update on the stuck on "Initialize UMAP..." message issue? I am currently facing the exact same problem which I cannot recreate with the online embedding projector using the same data. Unless "Spherize data" is checked before clicking on UMAP, TensorBoard embedding projector remains stuck on "Initialising UMAP...".