mediapipe
mediapipe copied to clipboard
MediaPipe LLM Inference APi support in non-Chromium browsers
EDIT: new demo link
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
No
OS Platform and Distribution
MacOS Sonoma 14.5
Mobile device if the issue happens on mobile device
No response
Browser and version if the issue happens on browser
No response
Programming Language and version
JavaScript
MediaPipe version
No response
Bazel version
No response
Solution
MediaPipe LLM Inference API
Android Studio, NDK, SDK versions (if issue is related to building in Android environment)
No response
Xcode & Tulsi version (if issue is related to building for iOS)
No response
Describe the actual behavior
Errors are thrown in Firefox (including Nightly) and Safari (including Technology Preview)
Describe the expected behaviour
My MediaPipe code works in all browsers
Standalone code/steps you may have used to try to get what you need
I built and pushed a [demo](https://github.com/GoogleChromeLabs/web-ai-demos/tree/main/perf-client-side-gemma-worker) based on the [MediaPipe LLM Inference API tutorial](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js).
One of my [lines](https://github.com/GoogleChromeLabs/web-ai-demos/blob/main/perf-client-side-gemma-worker/src/worker.js#L11) using MediaPipe is causing issues in non-Chromium browsers. It seems WebGPU-related:
**Firefox (also in Firefox Nightly):**
* I get this error: `TypeError: navigator.gpu is undefined`
* I thought this would work, as per
https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API
* Is this expected? Are there flags I'm supposed to turn on?
**Safari:**
* Regular Safari: `Unhandled Promise Rejection: TypeError: undefined is not an object (evaluating 'navigator.gpu.requestAdapter') worker line 11 (llmInference = await LlmInference.createFromModelPath(genai, MODEL_URL);)` - Is this expected? I guess so, as per the support table above.
* Safari Technology Preview: `Unhandled Promise Rejection: Error: The WebGPU device is unable to execute LLM tasks, because the required maxStorageBufferBindingSize is at least 524550144 but your device only supports maxStorageBufferBindingSize of ... <something less>` - Is there anything I can do to work around this?
Other info / Complete Logs
Code to reproduce is available here: https://github.com/GoogleChromeLabs/web-ai-demos/tree/main/perf-client-side-gemma-worker
Hi @maudnals,
Thank you for requesting support for non-Chromium browsers. We have shared this feature with our team, and its implementation will depend on future demand and discussions. However, we can not provide a timeline at this time.
Regular Safari does not have WebGPU support, and so cannot run our LLM Inference as of now. Safari Technology Preview, however, should generally work.
The maxStorageBufferBindingSize error you are seeing should be telling you the GPU memory limits on your device, so either the version of Safari Technology Preview you were using was being too restrictive with its limits (try again with the latest version?) or else the specific device you were running on didn't have enough memory for that model, and you'd need to try with a different device.
Also, in your notes I see you have a bit of a non-standard setup-- can you try with the MediaPipe Studio demo and see if that works or not?
- Safari: Thank you, you are correct! I tried again with the latest version of Safari ITP, and couldn't reproduce the error. The previous version was likely too restrictive with its limits.
- Firefox: I found out that WebGPU is only supported in Firefox in a web worker when a dedicated flag is on (
dom.webgpu.workers.enabled). My demo uses a web worker.
Questions:
- Is there a plan to add a fallback, is the current behavior intended? At the moment, in a context where WebGPU is not supported (e.g. in Firefox without the mentioned flag), an error is thrown:
navigator.gpu is undefined. I expected MediaPipe to know to fallback to non-WebGPU. - You mention my set up is a bit unusual, would you please expand? Are you referring to the web worker or to something else?
Currently for LLM Inference on web, there is only a WebGPU implementation-- unlike our other non-experimental APIs, we do not have any alternative implementation, and thus unfortunately no fallback is possible.
And yes, by non-standard, I just meant that it seemed likely that your issue was not using one of the stock example scripts/setups, since at the very least you were using web workers (I am unable to see the code for full information, though, since the initial link of https://github.com/GoogleChromeLabs/web-ai-demos/tree/main/perf-client-side-gemma-worker appears broken at the moment).
Currently for LLM Inference on web, there is only a WebGPU implementation-- unlike our other non-experimental APIs, we do not have any alternative implementation, and thus unfortunately no fallback is possible.
Noted, thanks. I see it's called out in your docs, which is great: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js#browser_compatibility
You were using web workers
Yes, I'm using a web worker. Updated demo links, sorry for the former broken link:
- Deployed demo: https://chrome.dev/web-ai-demos/perf-client-side-gemma-worker/
- Code: https://github.com/GoogleChromeLabs/web-ai-demos/tree/main/perf-worker-gemma
Note: While I understand they're not listed in the stock setups, web workers in practice can make a notable performance difference upon model preparation (right after download). In my demos, moving MediaPipe code to a web worker was an easy way to transform UX, going from a temporarily frozen web page to a responsive one. So, they can be an attractive option for web developers using the MediaPipe lib. More details here:
- https://web.dev/articles/client-side-ai-performance#offload_expensive_tasks_to_a_web_worker
- https://web.dev/articles/client-side-ai-performance#move_inference_to_a_web_worker // No performance win, but cleaner code if other MediaPipe code is already in a web worker
By the way, I'm not exactly sure about what happens under the hood during what I called model preparation. But I did observe empirically that moving this step to a web worker was very beneficial for page performance. Welcome your thoughts!
We have generally not provided demos for WebWorker support as the setup is a bit out of scope for our AI-focused onboarding guides. I agree with you though that WebWorkers are important especially as models are getting larger and inference times are going up. Thanks also for setting up this awesome demo.
If I understand correctly, the issue has now mostly been solved. Hence, I would like to close this issue. We will discuss internally if we can add a demo for WebWorkers to our getting started guides.
We will discuss internally if we can add a demo for WebWorkers to our getting started guides.
Sounds great! As a lower-effort option, feel free to link to https://web.dev/articles/client-side-ai-performance from your content. In any case, we'll be happy to support, ping me.
With this in mind, closing this issue SGTM.