web-llm Model Request: Gemma 3

Google has released version 3 of gemma. I'd love:

google/gemma-3-1b-it google/gemma-3-4b-it google/gemma-3-12b-it google/gemma-3-27b-it

Mar 12 '25 10:03 rogerdcarvalho

Agreed, get more bang for your buck with these models so would be great to have them on WebLLM

Mar 12 '25 16:03 Gavriel94

The problem ist that mlc-llm is not yet ready for gemma3: https://github.com/mlc-ai/mlc-llm/issues/3171 I really hope they look into it soon

Mar 12 '25 16:03 nico-martin

So it seems like the mlc-llm compiler added support for gemma3. But it does not work with the webgpu build target:

import { CreateMLCEngine } from '@mlc-ai/web-llm';
 
const engine = await CreateMLCEngine('Nicos-Gemma3', {
  appConfig: {
    model_list: [{
      model_id: 'Nicos-Gemma3',
      model: 'https://uploads.nico.dev/mlc-llm-libs/gemma-3-4b-it-q4f16_1-MLC/',
      model_lib: 'https://uploads.nico.dev/mlc-llm-libs/gemma-3-4b-it-q4f16_1-MLC/lib/gemma-3-4b-it-q4f16_1-webgpu.wasm',
    }],
  },
});
 
await engine.chat.completions.create({
  messages: [
    { role: 'system', content: 'You are a helpful AI assistant.' },
    { role: 'user', content: 'What is a GPU?' },
  ]
});
 
const fullReply = await engine.getMessage();

I get the following console error:

Mar 18 '25 21:03 nico-martin

@nico-martin how did you create the .wasm file?

Apr 08 '25 04:04 johnyquest7

I did use the mlc_llm compiler from this tutorial: https://llm.mlc.ai/docs/compilation/compile_models.html

Downloaded the model from huggingface, created the chat config, converted the weights and generated the wasm lib.

Apr 08 '25 05:04 nico-martin

@nico-martin I looked a bit further into the error and it comes from the https://github.com/mlc-ai/tokenizers-cpp package which wraps another library. The last release seems to be quite old so I rebuilt that package. Both @mlc-ai/web-xgrammar and the main web-llm package depend on this so those needed to be rebuilt. I lastly rebuilt the @mlc-ai/web-runtime package just to make sure everything was up to date.

One thing I noticed about these packages, especally the xgrammar package, is that they have not had releases in months. The xgrammar one is also only passing 10/20 tests and actually fails to build due to some of the functions no longer exisiting in the project for the wasm build.

For your URLs, the model one needs to have a resolve/main/mlc-chat-config.json file and overall structure. When I switched it out for the huggingface URL to the model and kept your wasm build, I got:

I rebuilt the wasm lib based on https://huggingface.co/mlc-ai/gemma-3-4b-it-q4f16_1-MLC locally and was able to load that.

After all that, it does go through each chunk of the model and loads it, and the engine does say it initializes, but when it starts processing a prompt, it gives a TypeError for C.vecStringFromJSArray which is related to the binding generated from emscripten. I am thinking this could be from xgrammar since it had so many issues building.

It seems a version bump would be needed across all the packages as well as resolving the build errors and bindings

Apr 09 '25 19:04 Wetbikeboy2500

Thanks for the discussion here! The tokenizers-cpp related issue should be fixed after 0.2.79 as discussed in the PR mentioned in the thread.

Gemma3 support is still blocked by a sliding-window related issue that needs to be addressed in MLC-LLM (affects correctness)

In the meantime, feel free to try out the newly supported Qwen3 that allows you to toggle thinking.

May 05 '25 20:05 CharlieFRuan