words2map icon indicating copy to clipboard operation
words2map copied to clipboard

Using LargeVis instead t-SNE?

Open DataWaveAnalytics opened this issue 8 years ago • 9 comments

Thanks for sharing your interesting work.

I would recommend using LargeVis instead of t-SNE to get the low-dimensionality representation.

DataWaveAnalytics avatar Jan 20 '17 04:01 DataWaveAnalytics

Wow, very easy to recognize that's a great suggestion actually!

If you wish to do so yourself and create a pull request I would gladly accept it. Otherwise you can expect I'll probably get to it sooner or later.

Thanks and all the best.

legel avatar Jan 20 '17 04:01 legel

I recently had great success with a very similar graph extraction technique on Wikipedia, prior to feeding it to the same type of collocation / nearest neighbor computation that word2vec implicitly computes, using Swivel, and I plan to push parts of these results here fairly soon...

legel avatar Jan 20 '17 04:01 legel

Cool, more interesting work coming up. The same authors of LargeVis create LINE for learning network embeddings. Cool and scalable algorithms in case you need them.

Attached there is a visualization (without the clusters with nice colors) of the tech people using LargeVis. I didn't play much with the parameters, but you can see on the left side that Larry Page is much closer to Sergey Brin are closer compared with t-SNE visualization. I think LargeVis should be the state-of-the-art algorithm, especially because you should be careful setting the perplexity.

tech.svg.zip

DataWaveAnalytics avatar Jan 20 '17 05:01 DataWaveAnalytics

Thanks for the tip to LINE as well - I'll have to analyze that one, but in any case very impressive and cool.

I could agree right away from your LargeVis demo that it's better, especially if you didn't have to worry about that perplexity parameter, which makes-or-breaks the whole optimization. Also, Larry should never be too far from Sergey. :)

legel avatar Jan 20 '17 05:01 legel

I've doing tests of your recommendation HDBSCAN for the clustering and the results are promising when using the results of LargeVis. Here, I attached the output of the HDBSCAN clustering on the tech people (n=63, min_cluster_size=3) and mnist (n=70000, min_cluster_size=25) datasets.

tech_people.pdf mnist.lv.hdbscan.pdf

DataWaveAnalytics avatar Jan 25 '17 00:01 DataWaveAnalytics

Amazing work demonstrating LargeVis on MNIST vectors. Was this dimensionality reduction from the raw pixel values? It's clear that LargeVis gives us 10 distinct clusters (especially when you're looking for them), but some are more grouped than others, and this seems to confuse HDBSCAN a bit. I think @lmcinnes would find these results interesting too. :)

legel avatar Jan 25 '17 05:01 legel

So I'm actually working on a (generalized) LargeVis implementation myself, more to explore some other fun things I believe you can do (LargeVis on arbitrary dataframes rather than just vector space data). O think it is important to note that at its core LargeVis really is just t-SNE -- mostly what it provides is some very clever ideas for improving the optimization, and hence can get better KL-divergence on large datasets. In practice I see LargeVis as important because of its scalability to very large datasets, if you have small data I would expect little difference with t-SNE.

In the case of the MNIST dataset, yes, that's quite interesting -- you do actually get groups, but with the sub grouping going on (and the large gaps between larger groups) does make HDBSCAN lump them together (not unreasonably in some sense). I expect the condensed tree will have that substructure well described. I have been starting to explore alternative cluster selection methods than the direct Excess of Mass algorithm used right now, but I haven't really found anything great (except that leaves of the condensed tree are sometimes actually what you want for specific use cases).

lmcinnes avatar Jan 25 '17 11:01 lmcinnes

@legel yes, I used the raw pixel values to compute it. @lmcinnes It is true LargeVis is partially based on t-SNE and as you mention the most important feature is the scalability. There are three main components in LargeVis.

  • An efficient KNN graph computation. Much faster that using VP-trees.
  • The probabilistic model using binary edges with sampling methods.
  • Asynchronous Stochastic Gradient Descendent to optimize.

I've seen differences even in small datasets. In my experience, with both methods LargeVis is superior.

DataWaveAnalytics avatar Jan 30 '17 11:01 DataWaveAnalytics

@lmcinnes thanks for all this, looking forward to hearing about / exploring any updates! @csanhuezalobos I'd like to swap out t-SNE in the reduce_dimensionality function with at least an option to use LargeVis. The C++ compilations are not too bad and I think I can pop them into install.sh as needed with no problem across OSX and Linux; it was their python wrapper that tripped me up a bit. If you have any scripts that play well with the Word2Vec vectors from load_derived_vectors I'd appreciate seeing those! Of course a pull request here would be most welcomed, but either way thanks.

Meanwhile for fun, I hacked a words2map 3D VR demo but haven't pushed code for it yet because it seems t-SNE is even more sensitive to hyperparameters for 3D, and haven't been able to get many good visualizations beyond the passions.txt there.

legel avatar Mar 29 '17 15:03 legel