zeno calculate projection in preprocessing

If a user provides embeddings, we should compute the projections as a preprocessing step and cache the result. Will make interaction from then on much, much faster. Can create an option to not compute projections as well if we want.

Feb 12 '23 16:02 cabreraalex

@xnought any thoughts on this? Any downside? One I can think of is you have to store the projection coordinates, using up disk space, but should be minimal?

Feb 13 '23 18:02 cabreraalex

Depending on the data format yeah disk space would not be too bad.

Sidenote: it could be better to use parquet when caching columns for that extra compression.

Feb 13 '23 18:02 xnought

I do like your idea. I think I'll give that a shot next.

Feb 13 '23 18:02 xnought

There is also something else to think about: should users be able to mess with tsne parameters (like perplexity)?

Should the user be able to recompute tsne? Given how different the results are with the tsne parameters, maybe?

Feb 13 '23 18:02 xnought

Also if there dataset is too large and tsne ends up taking the eternities, what then?

That would favor our current method where they can just load one tsne instead or preloading all of them.

Feb 13 '23 18:02 xnought

We could add an option to the TOML that are parameters for the TSNE?

For your last point, if it's too large the current method would be worse because if you leave the screen it would stop processing and lose your progress.

Feb 13 '23 19:02 cabreraalex