transformers.js Phi-3.5-vision-instruct: Uncaught RuntimeError: Aborted(). Build with -sASSERTIONS for more info

System Info

@huggingface/[email protected] Browser: Chromium Version 133.0.6845.0 (Entwickler-Build) (64-Bit) Node/bundler: none, simple HTML setup

Environment/Platform

[x] Website/web-app
[ ] Browser extension
[ ] Server-side (e.g., Node.js, Deno, Bun)
[ ] Desktop app (e.g., Electron)
[ ] Other (e.g., VSCode extension)

Description

I just wanted to test your work on https://github.com/huggingface/transformers.js/pull/1094

I took the example code and simply turned it into something you can just put into a HTML and run it.

But the result is:

Do you have any idea what's going on? It seems to load the entire model, then trying to load it, but just crashing shortly after.

Reproduction

<body>
<script type="module">
import {
  AutoProcessor,
  AutoModelForCausalLM,
  TextStreamer,
  load_image,
} from "https://cdn.jsdelivr.net/npm/@huggingface/[email protected]";
// Load processor and model
const model_id = "onnx-community/Phi-3.5-vision-instruct";
const processor = await AutoProcessor.from_pretrained(model_id, {
  legacy: true, // Use legacy to match python version
});
const model = await AutoModelForCausalLM.from_pretrained(model_id, {
  dtype: {
    vision_encoder: "q4", // 'q4' or 'q4f16'
    prepare_inputs_embeds: "q4", // 'q4' or 'q4f16'
    model: "q4f16", // 'q4f16'
  },
});
// Load image
const image = await load_image("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/meme.png");
// Prepare inputs
const messages = [
  { role: "user", content: "<|image_1|>What's funny about this image?" },
];
const prompt = processor.tokenizer.apply_chat_template(messages, {
  tokenize: false,
  add_generation_prompt: true,
});
const inputs = await processor(prompt, image, { num_crops: 4 });
// (Optional) Set up text streamer
const streamer = new TextStreamer(processor.tokenizer, {
  skip_prompt: true,
  skip_special_tokens: true,
});
// Generate response
const output = await model.generate({
  ...inputs,
  streamer,
  max_new_tokens: 256,
});
</script>
</body>

Jan 13 '25 17:01 kungfooman

hello! did you find a workaround for this please? i am having the exact same issue with phi 4 mini and qwen 2.5 1.5b, but i don't have any problems loading deepseek or granite

Mar 12 '25 22:03 undivisible

having the same issue on Macbook 32g m1 chip.

May 31 '25 17:05 InfTkm

I think this is just OOM when the model needs to allocate more than 4GB to load. I encountered the same in #1286.

You can try adding this line to your ort-wasm-simd-threaded.jsep.mjs to check the WebAssembly memory:

{console.log(`[ONNX] [${new Date().toLocaleString('en-GB', { day: '2-digit', month: 'short', year: 'numeric', hour: '2-digit', minute: '2-digit', second: '2-digit', hour12: false }).replace(/,/g, '').replace(/ /g, '/').replace(/(\d{2})\/(\w{3})\/(\d{4})\/(\d{2}):(\d{2}):(\d{2})/, '$1/$2/$3 $4:$5:$6')}] Memory grew by ${pages} pages (${Number(pages/16).toFixed(2)} MB). Current size: ${Number((x.buffer.byteLength+pages)/1024/1024).toFixed(2)} MB.`);return WebAssembly.Memory.prototype.grow.call(this,pages);}

This is reproduceable on both WASM and WebGPU.

For example, when I load whisper-turbo q4 encoder-decoder:

When I load nllb-200-distilled-600M fp32 encoder q8 decoder: This is very likely OOM as we can deduce from the model size. Similarly, when I load nllb-200-distilled-600M q8 encoder fp32 decoder: This should also be OOM but shows a minified error instead. And the minified error code is pretty much random every time I try to load the same model that is too large, so I cannot learn any information here.

Jun 02 '25 01:06 SignOfZeta

transformers.js transformers.js copied to clipboard

Phi-3.5-vision-instruct: Uncaught RuntimeError: Aborted(). Build with -sASSERTIONS for more info

System Info

Environment/Platform

Description

Reproduction

transformers.js
transformers.js copied to clipboard