tensorboard
tensorboard copied to clipboard
Move UMAP to a web worker
This PR moves the UMAP projection method's functionality into a web worker, which allows us to do a number of very nice things.
Since the main (render) thread isn't blocked by the worker, we can retain full UI responsiveness while the projection is running. This means we can create some nice UX (allowing the user to cancel out of the projection very easily, allowing them to move the projection smoothly while optimizing, etc).
Theoretically we can move t-SNE into a worker in a similar manner. Shouldn't be too hard to do.
Some trouble spots to consider. There's some annoying workarounds to deal with the build system that are a little unorthodox:
- We need to stringify/blobify the web worker function because we can't build a separate JS file for the web worker.
- Because of this stringification/blobification, we need to replace a magic
/** UMAP_JS_SCRIPT_CODE */
comment in the worker code in order to populate it with the umap-js code. This is the only option for building the worker when every asset is concatenated into the html file. - Not sure if it's overkill, but we may need a fallback option if web workers aren't supported - although at this point I can't imagine anyone using tensorboard in an environment that doesn't support them.
Also... It was suggested to me that it might be worthwhile to think about adding webworker functionality to umap-js itself... I think there's benefits to doing it in tensorboard (because t-SNE can leverage the same infra), but I might explore that for the umap lib
I have not explored this option too deeply but one option you have is to use (1) bazel_nodejs build macros to use rollup (example) or (2) bazel_clsoure's closure_js_binary to create worker JS bundle. After creating the binary, you want to add it to our webfile.zip below:
https://github.com/tensorflow/tensorboard/blob/a2050e2fdf1ad142c351a38936c36815797cef09/tensorboard/BUILD#L201-L216
It may look less hacky than your current code.
One thing that the "hacky" text parsing code gets us is single-html file support which is really nice for the colab widget in particular. I'm not positive that the web worker will work in the colab widget anyway (will have to test that) but it seems like the only way to do it would be to use this approach.
Also, whoa! Is that a React frontend in a tensorboard plugin!? That's amazing work!
Drive by comment: it probably makes sense to have this work in the NPM module as well and have it be entirely transparent to the user (by stringifying the computation module and shipping it to the thread all inside the package).
One note is that you always have to memcpy to a worker (no SharedArrayBuffers across all browsers) so if the dataset is large you may run into problems if you need 2x the memory footprint.
@nsthorat
Hmmm.... I think self-stringification is actually going to be a pretty big challenge. I think getting it to work as a standalone web library might be feasible, but I think it might be extremely challenging to get to work as an npm module as part of another app's build system (since the code will have to have access to itself as final, built code).
Because of this and the fact that the colab widget needs to be a single html page with inlined JS, I think the best option is managing library stringification / web worker construction in the embedding projector plugin. Plus, we get the necessary abstractions to also handle t-SNE in a web worker.
As far as the memory goes, the JS implementation of UMAP starts to degrade at >5k points right now so we're limiting it to 5k points in the embedding projector. This means that we're very unlikely to see any memory issues from copying to the worker.
@stephanwlee Mind taking another look at this? Like I said, I think this is actually the most practical / least trouble-causing way of implementing web worker functionality given our build / target constraints.
@cannoneyed a bit late to the party. I found a way to get UMAP to work without blocking the main thread by simply relying on Asynchronous fitting.
However I am a bit torn with the UX. 400 iterations of UMAP fitting takes probably 20+ seconds to run on my M1 pro. So in the mean time while the calculations are running, the main thread isn't really blocked and the user can rotate and pan the camera. However it is very difficult to tell that the user is just looking at old data (whether PCA) or an older UMAP run. I hate to say it but I actually prefer a blocking UI to reduce user confusion. Also, by allowing user to change parameters in the midst of calculation it has the chance to invalidate the current in progress run, further complicating things.
I've also tried different UI hints like showing a Epoch iteration counter (similar to t-SNE counter) as well as visualizing each epoch in the scatter plot but found that the points jump too much and it isn't nearly as nice as t-SNE iterations.
One note regarding t-SNE I do feel the browser to be sluggish while chugging through different t-SNE iterations. The framerate drops from 120fps to 10fps. So I think that might be a good target if we can offload the computation off of requestAnimationFrame, but the animation would still appear sluggy since each change would take 300ms and you'll still get 10fps (but I suppose camera controls would work much better). I think the main difference is that UMAP iterations are much shorter than t-SNE iterations so UI feels much more responsive with async UMAP fitting