mediapipe MediaPipe LLM Inference APi support in non-Chromium browsers

EDIT: new demo link

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

MacOS Sonoma 14.5

Mobile device if the issue happens on mobile device

No response

Browser and version if the issue happens on browser

No response

Programming Language and version

JavaScript

MediaPipe version

No response

Bazel version

No response

Solution

MediaPipe LLM Inference API

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

No response

Xcode & Tulsi version (if issue is related to building for iOS)

No response

Describe the actual behavior

Errors are thrown in Firefox (including Nightly) and Safari (including Technology Preview)

Describe the expected behaviour

My MediaPipe code works in all browsers

Standalone code/steps you may have used to try to get what you need

I built and pushed a [demo](https://github.com/GoogleChromeLabs/web-ai-demos/tree/main/perf-client-side-gemma-worker) based on the [MediaPipe LLM Inference API tutorial](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js).
One of my [lines](https://github.com/GoogleChromeLabs/web-ai-demos/blob/main/perf-client-side-gemma-worker/src/worker.js#L11) using MediaPipe is causing issues in non-Chromium browsers. It seems WebGPU-related: 

**Firefox (also in Firefox Nightly):**
* I get this error: `TypeError: navigator.gpu is undefined`
* I thought this would work, as per
https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API
* Is this expected? Are there flags I'm supposed to turn on?


**Safari:**
* Regular Safari: `Unhandled Promise Rejection: TypeError: undefined is not an object (evaluating 'navigator.gpu.requestAdapter')  worker line 11 (llmInference = await LlmInference.createFromModelPath(genai, MODEL_URL);)` - Is this expected? I guess so, as per the support table above.
* Safari Technology Preview: `Unhandled Promise Rejection: Error: The WebGPU device is unable to execute LLM tasks, because the required maxStorageBufferBindingSize is at least 524550144 but your device only supports maxStorageBufferBindingSize of ... <something less>` - Is there anything I can do to work around this?

Other info / Complete Logs

Code to reproduce is available here: https://github.com/GoogleChromeLabs/web-ai-demos/tree/main/perf-client-side-gemma-worker

Aug 08 '24 08:08 maudnals

Hi @maudnals,

Thank you for requesting support for non-Chromium browsers. We have shared this feature with our team, and its implementation will depend on future demand and discussions. However, we can not provide a timeline at this time.

Aug 09 '24 06:08 kuaashish

Regular Safari does not have WebGPU support, and so cannot run our LLM Inference as of now. Safari Technology Preview, however, should generally work.

The maxStorageBufferBindingSize error you are seeing should be telling you the GPU memory limits on your device, so either the version of Safari Technology Preview you were using was being too restrictive with its limits (try again with the latest version?) or else the specific device you were running on didn't have enough memory for that model, and you'd need to try with a different device.

Also, in your notes I see you have a bit of a non-standard setup-- can you try with the MediaPipe Studio demo and see if that works or not?

Oct 16 '24 20:10 tyrmullen

Safari: Thank you, you are correct! I tried again with the latest version of Safari ITP, and couldn't reproduce the error. The previous version was likely too restrictive with its limits.
Firefox: I found out that WebGPU is only supported in Firefox in a web worker when a dedicated flag is on (dom.webgpu.workers.enabled). My demo uses a web worker.

Questions:

Is there a plan to add a fallback, is the current behavior intended? At the moment, in a context where WebGPU is not supported (e.g. in Firefox without the mentioned flag), an error is thrown: navigator.gpu is undefined. I expected MediaPipe to know to fallback to non-WebGPU.
You mention my set up is a bit unusual, would you please expand? Are you referring to the web worker or to something else?

Oct 18 '24 15:10 maudnals

Currently for LLM Inference on web, there is only a WebGPU implementation-- unlike our other non-experimental APIs, we do not have any alternative implementation, and thus unfortunately no fallback is possible.

And yes, by non-standard, I just meant that it seemed likely that your issue was not using one of the stock example scripts/setups, since at the very least you were using web workers (I am unable to see the code for full information, though, since the initial link of https://github.com/GoogleChromeLabs/web-ai-demos/tree/main/perf-client-side-gemma-worker appears broken at the moment).

Oct 22 '24 01:10 tyrmullen

Currently for LLM Inference on web, there is only a WebGPU implementation-- unlike our other non-experimental APIs, we do not have any alternative implementation, and thus unfortunately no fallback is possible.

Noted, thanks. I see it's called out in your docs, which is great: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js#browser_compatibility

You were using web workers

Yes, I'm using a web worker. Updated demo links, sorry for the former broken link:

Deployed demo: https://chrome.dev/web-ai-demos/perf-client-side-gemma-worker/
Code: https://github.com/GoogleChromeLabs/web-ai-demos/tree/main/perf-worker-gemma

Note: While I understand they're not listed in the stock setups, web workers in practice can make a notable performance difference upon model preparation (right after download). In my demos, moving MediaPipe code to a web worker was an easy way to transform UX, going from a temporarily frozen web page to a responsive one. So, they can be an attractive option for web developers using the MediaPipe lib. More details here:

https://web.dev/articles/client-side-ai-performance#offload_expensive_tasks_to_a_web_worker
https://web.dev/articles/client-side-ai-performance#move_inference_to_a_web_worker // No performance win, but cleaner code if other MediaPipe code is already in a web worker

By the way, I'm not exactly sure about what happens under the hood during what I called model preparation. But I did observe empirically that moving this step to a web worker was very beneficial for page performance. Welcome your thoughts!

Oct 22 '24 09:10 maudnals

We have generally not provided demos for WebWorker support as the setup is a bit out of scope for our AI-focused onboarding guides. I agree with you though that WebWorkers are important especially as models are getting larger and inference times are going up. Thanks also for setting up this awesome demo.

If I understand correctly, the issue has now mostly been solved. Hence, I would like to close this issue. We will discuss internally if we can add a demo for WebWorkers to our getting started guides.

Oct 28 '24 15:10 schmidt-sebastian

We will discuss internally if we can add a demo for WebWorkers to our getting started guides.

Sounds great! As a lower-effort option, feel free to link to https://web.dev/articles/client-side-ai-performance from your content. In any case, we'll be happy to support, ping me.

With this in mind, closing this issue SGTM.

Oct 29 '24 13:10 maudnals

mediapipe mediapipe copied to clipboard

MediaPipe LLM Inference APi support in non-Chromium browsers

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

Mobile device if the issue happens on mobile device

Browser and version if the issue happens on browser

Programming Language and version

MediaPipe version

Bazel version

Solution

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

Xcode & Tulsi version (if issue is related to building for iOS)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

mediapipe
mediapipe copied to clipboard