mediapipe Web fails to instantiate Gemma 3n even with sufficient RAM

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

ChromeOS 136, Windows 11 24H2

MediaPipe Tasks SDK version

0.10.23

Task name (e.g. Image classification, Gesture recognition etc.)

LLM Inference

Programming Language and version (e.g. C++, Python, Java)

JavaScript

Describe the actual behavior

Gemma 3n E2B fails to create (LlmInference.createFromOptions) with "RangeError: Array buffer allocation failed" with 8GB ram (4-5GB free)

Describe the expected behaviour

Gemma 3n E2B creates successfully (LlmInference.createFromOptions), with no errors

Standalone code/steps you may have used to try to get what you need

https://gist.github.com/cosmicallyrun/2f4b407682c94f226388df0db96e069c Ran the e2b .task file from "https://huggingface.co/google/gemma-3n-E2B-it-litert-preview" on devices with 8GB ram (4-5GB free)). Other models (Gemma 3 1b-it, Gemma 2 2b-it) work fine.

Other info / Complete Logs

https://developers.googleblog.com/en/introducing-gemma-3n/ states "Gemma 3n leverages a Google DeepMind innovation called Per-Layer Embeddings (PLE) that delivers a significant reduction in RAM usage... the models can operate with a dynamic memory footprint of just 2GB and 3GB".

May 22 '25 03:05 cosmicallyrun

Hi @cosmicallyrun,

To assist us in reproducing or investigating the issue, please provide the complete documentation you are following or share your setup details. This will help us to understand the issue better.

Thank you!!

May 22 '25 14:05 kuaashish

Hi, thanks for looking into this! Here are the complete setup details: Documentation and Codebase:

Following the official MediaPipe LLM Inference documentation: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js
Code based on the official MediaPipe sample from https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/llm_inference/js
Complete implementation: https://gist.github.com/cosmicallyrun/2f4b407682c94f226388df0db96e069c

System details:

ChromeOS 136 and Windows 11 24H2
CPU Architecture: x86
RAM: 8GB total, 4-5GB free at time of error
Models downloaded from Hugging Face (E2B at https://huggingface.co/google/gemma-3n-E2B-it-litert-preview) and loaded locally

Model comparison:

Gemma 3 1B task file: ~500MB ✅ (works)
Gemma 2 2B task file: ~2.4GB ✅ (works)
Gemma 3n E2B task file: ~2.9GB ❌ (fails with RangeError)

Error log:

tasks-genai:7 The powerPreference option is currently ignored when calling requestAdapter() on Windows. See https://crbug.com/369219127
tasks-genai:7 Experimental Chromium WGSL subgroup support detected. Enabling this feature in the inference engine.
auto.html:515 LlmInference creation error: RangeError: Array buffer allocation failed
    at new ArrayBuffer (<anonymous>)
    at new Uint8Array (<anonymous>)
    at tasks-genai:7:39567
    at async Rr.fa (tasks-genai:7:39376)
    at async Rr.W (tasks-genai:7:38393)
    at async hr (tasks-genai:7:35331)

Key issue: The error occurs during LlmInference.createFromOptions() at the ArrayBuffer allocation step. Despite having 4-5GB free RAM and the Gemma 3n documentation claiming a "dynamic memory footprint of just 2-3GB", the model fails to initialize.

May 23 '25 01:05 cosmicallyrun

Hi @whhone,

Could you please look into this issue?

Thank you!!

May 23 '25 08:05 kuaashish

It's not even due to the file size, as the gemma3-12b-it-int4-web.task model works fine. The issue arises only with the new Gemma 3n models, which result in an "Array buffer allocation failed" error.

May 23 '25 10:05 PasiKoodaa

Hi, is there any update on this issue?

May 30 '25 05:05 cosmicallyrun

As far as I can tell, it is known and expected that the current web SDK (0.10.23) does not support Gemma 3n. We at least need to wait for the support and use it in a future release.

@tyrmullen probably can give more details.

May 30 '25 17:05 whhone

Correct -- there's no web support yet for Gemma 3n, but we're working on it!

I believe the Gemma 3n preview will likely only run on high-end Android devices at the moment.

Currently supported LLM architectures on web include:

all text-only Gemma 3 variants
MedGemma-27B
Gemma 2 2B
the older architectures we initially launched with (Phi 2, Falcon 1B, Stable LM 3B, Gemma 1 2B & 7B)

Additionally we support the Gemma3-1B-int4 "QAT" (the "non -web.task" filename) using the same systems as on Android, so that model is compatible with the tools like the colabs for fine-tuning.

May 30 '25 18:05 tyrmullen

Hi, I hope you’re doing well! I was wondering if there have been any updates about this issue. I understand that this can take time, but it would be helpful to know if there’s an estimated timeline for progress or a resolution. Thanks for your hard work and support!

Jul 10 '25 18:07 cosmicallyrun

I'm also waiting for this, I have a great idea for a web app and I want it to be with Gemma 3n because it's such a great model.

As well knowing the date estimate for it to be supprted can help to know if it will be ready for the Gemma 3n kaggle competition. Thanks!

Jul 11 '25 22:07 pantchox

Yeah, Gemma 3n E2B's performance is so much better than Gemma 3 1b/Gemma 2 3b. Also, 0.10.25 says "Web LLM: Minor refactoring to allow more usage of newer LLM code paths"?

Jul 12 '25 00:07 cosmicallyrun

Seems like the latest builds have added support, but the NPM package hasn't been updated yet.

Aug 29 '25 23:08 cosmicallyrun

Gemma 3n QAT int4-quantized with full multimodality support (vision and audio) has now been fully released, as of npm version 0.10.25. The models are available here:

E4B: https://huggingface.co/google/gemma-3n-E4B-it-litert-lm/blob/main/gemma-3n-E4B-it-int4-Web.litertlm
E2B: https://huggingface.co/google/gemma-3n-E2B-it-litert-lm/blob/main/gemma-3n-E2B-it-int4-Web.litertlm

Documentation on multimodality to follow shortly on our web guide.

Sep 02 '25 21:09 tyrmullen

THANK YOU!!!

Sep 03 '25 00:09 cosmicallyrun

mediapipe mediapipe copied to clipboard

Web fails to instantiate Gemma 3n even with sufficient RAM

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

mediapipe
mediapipe copied to clipboard