transformers.js
transformers.js copied to clipboard
Supporting Multithreading in Web Extensions
Per discussion on microsoft/onnxruntime , I took a stab at implementing the extension example using an offscreen document in order to support multiple threads. There are a few serious drawbacks to this approach that I lay out below, including that the offscreen document seems to interfere with the popup window, so only the context menu workflow works in this PR. But I thought I'd post and give y'all a chance to review and think through next steps.
Overall, things turned out to be a bit more complicated than I thought:
- While the initial issue with multithreading seemed to be that the
background.js
service worker couldn't support the onnx runtime's calls toURL.createObjectURL
, it turned out that the background service worker also will not support the creation of new web Workers, which onnx creates when entering multi-threading mode. So even if we convinced microsoft/onnxruntime to avoid calls toURL.createObjectURL
, it still wouldn't be possible to run multithreaded inference in thebackground.js
service worker. Fortunately, creation of Workers is another valid reason for using an offscreen document. - Even within an offscreen document, the way that onnx's creates new threads will throw errors for violating Chrome extensions' Content Security Policy. To get around that I load and run the model in a sandboxed iframe within the offscreen page.
This offscreen page+sandbox iframe strategy does seem to allow for multithreading, but it has a couple of undesirable properties that might mean needing to try something else entirely.
- Offscreen documents don't really seem intended for staying open for long periods, whereas in this case the offscreen document needs to stay open pretty much indefinitely so that it can load the model once and have it available for multiple inferences. One very serious implication of having the offscreen document open all the time is that it seems to break the
popup
action -- at least I couldn't get the popup to appear after I had initialized the offscreen document. So this PR only works for the workflow where a user selects text and hits "classify" in the context window. One way forward here might be to see if the popup window (and potentially the offscreen document?) can be replaced by a Chrome Side Panel. - The sandbox does not have access to the Cache API, which I dealt with by setting
env.useBrowserCache = false
. There might be ways around this but I haven't given it much thought.
Anyway, interested to hear your thoughts. If there's interest in exploring the SidePanel approach I'd be happy to take a look and post another PR.
Thanks for your investigation! I must admit, I also had quite a few issues when originally creating the web extension template, as there were a ton of hoops to jump through to get something working. 😅
Ideally, this would be solved by onnxruntime-web
itself, since - as you point out - that bug has been present in the library for quite some time now (nearly a year). Though, it would seem as though they are focusing more of their attention on WebGPU support.
I'd be interested in seeing what performance benefits you are able to achieve with this approach - and if the difference is substantial, it would be nice to create a good example for it.
I agree, it's a bummer. An inability to create new Workers in the background script--which was masked by the createObjectURL error--seems like something that will be even harder for the onnxruntime-web folks to work around; in that case we might be waiting on the folks at Chrome to allow service workers to create workers, which might never happen. And even then we'd have to worry about keeping the service worker alive on long-running calculations!
In any case, I'll try replicating the same basic functionality of the current extension example using a Side Panel, which I hope will end up much cleaner than the offscreen document setup.
And yes, when I'm done I'm happy to run some benchmarks to show to what extent (if any!) performance varies between a single thread, multithread-in-offscreen, and multithread-in-sidepanel. I can think of a few experiments to run, but if you have some suggested test models+datasets I'm happy to run those too.
I've tried this architecture and it doesn't seem to affect my popup page. @BrennanBarker
I think chrome.offscreen.Reason.BLOBS, chrome.offscreen.Reason.WORKERS are not necessary, because onnxruntime mainly run inside the iframe. My code isn't in production yet, so if you can, I'd like to elaborate on why offscreen doucument affects your popup page. @BrennanBarker
@lxfater if I'm understanding correctly, you're suggesting moving the inference code into a sandboxed iframe within the popup page? That might work, but since the popup page only stays open in the context of one tab/window (and closes when changing tabs or focusing on the content of the current tab), I don't think it would be possible to send data from the rest of the browser into the extension, as @xenova intended with the (right-click) context menu action. It might work for just the popup action, although I'd be worried that any click outside the popup would kill the popup page and require reloading the model. A Side Panel would work similarly, though, (yet would be available to all pages), and likewise would not require an offscreen page at all; I'll submit a PR for this soon.
The reason the initial try here used an offscreen page was another way for the calculation to be done in a background process, which like a SidePanel could also stay open across pages and be and shared between tabs. I suppose it is unclear whether those offscreen.Reason
s need to be given if the calls to URL.createObjectURL
and new Worker
are being used in a sandbox, but as I understand the offscreen documentation, one needs to supply some Reason for opening the offscreen page in the first place, and these are the two reasons we use it in this case.
No, I think the code should be in the iframe of the offscreen document, I just think these two reasons are redundant. This may invalidate your popup action. Probably. hahaha
I went ahead and reimplemented this example using the Chrome Sidepanel.
I was able to avoid the use of the offscreen document, and both example input workflows (context menu and directly via the sidepanel UI) now work correctly. I found the sidepanel had several nice UI advantages as well.
I still needed to make use of a sandboxed page running in an iframe on the sidepanel UI to handle the way ONNX loads code for additional worker threads. This pattern is suggested by the Chrome Extensions docs, and it seems to work fine, with the one complication that, as mentioned above, the sandbox does not have access to the browser cache API, so models need to be reloaded any time the sidepanel is closed. I suspect there might be a way around this using a custom cache (thanks for those!), but I still haven't thought about it much.
As far as benchmarking -- I'm happy to take this on but might need some guidance as to what a good test case would be. I think we should only expect to see performance increase under some kinds of inference loads, but being less familiar with how ONNX and Transformers.js are taking advantage of multiple threads under the hood I don't have a good sense for when that would be. I'm also not sure if the (very old) hardware I happen to have access to at the moment is a good candidate for showcasing multithreading or running a benchmarking routine efficiently.
I also took the opportunity to implement the model loading progress bar, as that was slightly complicated and involved some message passing between the sidepanel UI and the sandbox. It works though!
Great stuff @BrennanBarker! I'll do a full review soon, but in the meantime, would you mind moving your code to a separate example folder (instead of overwriting the existing example)? Perhaps named something like ./examples/extension-multithreaded
?
SharedArrayBuffer does not work in sandboxed page!! @BrennanBarker