cuml
cuml copied to clipboard
[FEA] t-SNE initialization, learning rate, and exaggeration
I was in contact with Victor Lafargue who suggested I ping @danielhanchen for all cuML t-SNE-related questions. So here it goes. It's three suggestions, including some questions.
-
Recent research https://www.nature.com/articles/s41467-019-13056-x (disclaimer: mine) suggests that smart non-random initialization can be very important for good embeddings. For example, UMAP uses Laplacian Eigenmaps initialization, but t-SNE is often initialized randomly, which leads to unfair comparison (see here for more details on UMAP vs t-SNE initialization https://www.biorxiv.org/content/10.1101/2019.12.19.877522v1). Many t-SNE implementations such as FIt-SNE and openTSNE have recently transitioned to using PCA initialization as default. This should be very simple to implement, and is more sensible than using random initialization. Does cuML use random init for t-SNE? If so, would be great to default to PCA (or some other non-random) init.
-
Another issue discussed in the same paper is the learning rate: the traditionally default learning rate (200) can be WAY too small for large datasets. We recommend using learning_rate = N/12 heuristic as default for sample size N (where 12 is the early exaggeration factor). I saw that cuML uses some adaptive method by default ("Uses a special adpative method that tunes the learning rate, early exaggeration and perplexity automatically based on input size") but could not find a description anywhere. How exactly are these parameters chosen?
-
We also show (in that paper and also elsewhere in upcoming work) that t-SNE with exaggeration>1 can be helpful for large data sets (and exaggeration=4 produces results very similar to UMAP). Would be great to have this parameter adjustable in the API (not early exaggeration, but the exaggeration that is used after early exaggeration stops).
Hi @dkobak.
-
Anyways fantastic research paper! Good work! Yes you are correct that PCA init or say Laplacian Eigenmaps etc will generate much better TSNE outputs. Currently, TSNE does support random or PCA init. The reason why random is the default is because Sklearn (https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)'s default is random. However, I agree with your suggestion that PCA should be default.
-
Yes, LRs of 200 are way too low. I noticed this as well. However, currently a heurestic (as u mentioned) is used. Fascinatingly, by pure coincidence,
self.pre_learning_rate = max(n / 3.0, 1)
self.post_learning_rate = self.pre_learning_rate
self.early_exaggeration = 24.0 if n > 10000 else 12.0
So, currently n/3 is used (I just guessed this due to empirical runnings on MNIST etc etc). Changing to n/12 or n/early_exaggeration
sounds much more reasonable since there's now real research to back this (ie yours).
- So currently, it's disabled, but
post_learning_rate
is the one you're talking about. It's currently set to theself.pre_learning_rate
, but this can be changed. No algorithmic changes necessary.
@danielhanchen Thanks for a quick reply!
-
Yes! Strongly agree with using PCA init by default even though
sklearn
does not. The documentation here https://docs.rapids.ai/api/cuml/stable/api.html?highlight=tsne#cuml.TSNE only mentions random init as supported. If PCA init is already implemented, that's great. By the way, as described in our paper, it's better to scale the initialization to have small radius. The default random Gaussian init has std=0.0001, so we scale PCA init to make PC1 have std=0.0001. -
Oh I see. There is another reference for
learning_rate = n/early_exaggeration
: https://www.nature.com/articles/s41467-019-13055-y (published back to back with ours). In fact, usingn/3
together with early exaggeration 12 or 24 could lead to divergences in some cases. -
It's not
post_learning_rate
that I meant here, butpost_exaggeration
.
@dkobak Oh yep. PCA scaling is done. Although slightly differently. I think (can't remember 100%) it was uniform but within [-0.0001f, 0.0001f].
Interesting! n/early_exaggeration sounds much better then.
Oh yes ok I misread that. You essentially would like after 250 iterations or so, to instead of
VAL *= (1 / early_exaggeration)
to become VAL *= (post_exaggeration / early_exaggeration)
. VAL is the values for CSR sparse format.
Yes, exactly!
Hi @danielhanchen It would be great if we can use PCA init in TSNE! You mentioned PCA init is already implemented in cuML TSNE, but I could not find out it in the latest code. https://github.com/rapidsai/cuml/blob/branch-0.15/python/cuml/manifold/t_sne.pyx#L239 Is it in other repo or branch?
@resnant Oh I just checked. Seems like I was wrong. I was working on https://github.com/rapidsai/cuml/pull/1383 [TSNE PCA Init + 50% Memory Reductions]. It included PCA Init, 50% mem reductions and some stability fixes. Just forgot it didn't get merged yet.
I'm not so sure when I'll get back to it, but I'll see what I can do!
@danielhanchen Got it. Your work is awesome! It will be very helpful to me if it merged. Thanks anyway.
@resnant Thanks a lot!!! I'll see what I can do.
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Any plans to work on PCA initialization again?
In scikit, it's been the default for almost 2 years: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html