mediapipe
mediapipe copied to clipboard
Web fails to instantiate Gemma 3n even with sufficient RAM
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
Yes
OS Platform and Distribution
ChromeOS 136, Windows 11 24H2
MediaPipe Tasks SDK version
0.10.23
Task name (e.g. Image classification, Gesture recognition etc.)
LLM Inference
Programming Language and version (e.g. C++, Python, Java)
JavaScript
Describe the actual behavior
Gemma 3n E2B fails to create (LlmInference.createFromOptions) with "RangeError: Array buffer allocation failed" with 8GB ram (4-5GB free)
Describe the expected behaviour
Gemma 3n E2B creates successfully (LlmInference.createFromOptions), with no errors
Standalone code/steps you may have used to try to get what you need
https://gist.github.com/cosmicallyrun/2f4b407682c94f226388df0db96e069c Ran the e2b .task file from "https://huggingface.co/google/gemma-3n-E2B-it-litert-preview" on devices with 8GB ram (4-5GB free)). Other models (Gemma 3 1b-it, Gemma 2 2b-it) work fine.
Other info / Complete Logs
https://developers.googleblog.com/en/introducing-gemma-3n/ states "Gemma 3n leverages a Google DeepMind innovation called Per-Layer Embeddings (PLE) that delivers a significant reduction in RAM usage... the models can operate with a dynamic memory footprint of just 2GB and 3GB".
Hi @cosmicallyrun,
To assist us in reproducing or investigating the issue, please provide the complete documentation you are following or share your setup details. This will help us to understand the issue better.
Thank you!!
Hi, thanks for looking into this! Here are the complete setup details: Documentation and Codebase:
- Following the official MediaPipe LLM Inference documentation: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js
- Code based on the official MediaPipe sample from https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/llm_inference/js
- Complete implementation: https://gist.github.com/cosmicallyrun/2f4b407682c94f226388df0db96e069c
System details:
- ChromeOS 136 and Windows 11 24H2
- CPU Architecture: x86
- RAM: 8GB total, 4-5GB free at time of error
- Models downloaded from Hugging Face (E2B at https://huggingface.co/google/gemma-3n-E2B-it-litert-preview) and loaded locally
Model comparison:
- Gemma 3 1B task file: ~500MB ✅ (works)
- Gemma 2 2B task file: ~2.4GB ✅ (works)
- Gemma 3n E2B task file: ~2.9GB ❌ (fails with RangeError)
Error log:
tasks-genai:7 The powerPreference option is currently ignored when calling requestAdapter() on Windows. See https://crbug.com/369219127
tasks-genai:7 Experimental Chromium WGSL subgroup support detected. Enabling this feature in the inference engine.
auto.html:515 LlmInference creation error: RangeError: Array buffer allocation failed
at new ArrayBuffer (<anonymous>)
at new Uint8Array (<anonymous>)
at tasks-genai:7:39567
at async Rr.fa (tasks-genai:7:39376)
at async Rr.W (tasks-genai:7:38393)
at async hr (tasks-genai:7:35331)
Key issue:
The error occurs during LlmInference.createFromOptions() at the ArrayBuffer allocation step. Despite having 4-5GB free RAM and the Gemma 3n documentation claiming a "dynamic memory footprint of just 2-3GB", the model fails to initialize.
Hi @whhone,
Could you please look into this issue?
Thank you!!
It's not even due to the file size, as the gemma3-12b-it-int4-web.task model works fine. The issue arises only with the new Gemma 3n models, which result in an "Array buffer allocation failed" error.
Hi, is there any update on this issue?
As far as I can tell, it is known and expected that the current web SDK (0.10.23) does not support Gemma 3n. We at least need to wait for the support and use it in a future release.
@tyrmullen probably can give more details.
Correct -- there's no web support yet for Gemma 3n, but we're working on it!
I believe the Gemma 3n preview will likely only run on high-end Android devices at the moment.
Currently supported LLM architectures on web include:
- all text-only Gemma 3 variants
- MedGemma-27B
- Gemma 2 2B
- the older architectures we initially launched with (Phi 2, Falcon 1B, Stable LM 3B, Gemma 1 2B & 7B)
Additionally we support the Gemma3-1B-int4 "QAT" (the "non -web.task" filename) using the same systems as on Android, so that model is compatible with the tools like the colabs for fine-tuning.
Hi, I hope you’re doing well! I was wondering if there have been any updates about this issue. I understand that this can take time, but it would be helpful to know if there’s an estimated timeline for progress or a resolution. Thanks for your hard work and support!
I'm also waiting for this, I have a great idea for a web app and I want it to be with Gemma 3n because it's such a great model.
As well knowing the date estimate for it to be supprted can help to know if it will be ready for the Gemma 3n kaggle competition. Thanks!
Yeah, Gemma 3n E2B's performance is so much better than Gemma 3 1b/Gemma 2 3b. Also, 0.10.25 says "Web LLM: Minor refactoring to allow more usage of newer LLM code paths"?
Seems like the latest builds have added support, but the NPM package hasn't been updated yet.
Gemma 3n QAT int4-quantized with full multimodality support (vision and audio) has now been fully released, as of npm version 0.10.25. The models are available here:
- E4B: https://huggingface.co/google/gemma-3n-E4B-it-litert-lm/blob/main/gemma-3n-E4B-it-int4-Web.litertlm
- E2B: https://huggingface.co/google/gemma-3n-E2B-it-litert-lm/blob/main/gemma-3n-E2B-it-int4-Web.litertlm
Documentation on multimodality to follow shortly on our web guide.
THANK YOU!!!