transformers.js
transformers.js copied to clipboard
Phi-3.5-vision-instruct: Uncaught RuntimeError: Aborted(). Build with -sASSERTIONS for more info
System Info
@huggingface/[email protected] Browser: Chromium Version 133.0.6845.0 (Entwickler-Build) (64-Bit) Node/bundler: none, simple HTML setup
Environment/Platform
- [x] Website/web-app
- [ ] Browser extension
- [ ] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop app (e.g., Electron)
- [ ] Other (e.g., VSCode extension)
Description
I just wanted to test your work on https://github.com/huggingface/transformers.js/pull/1094
I took the example code and simply turned it into something you can just put into a HTML and run it.
But the result is:
Do you have any idea what's going on? It seems to load the entire model, then trying to load it, but just crashing shortly after.
Reproduction
<body>
<script type="module">
import {
AutoProcessor,
AutoModelForCausalLM,
TextStreamer,
load_image,
} from "https://cdn.jsdelivr.net/npm/@huggingface/[email protected]";
// Load processor and model
const model_id = "onnx-community/Phi-3.5-vision-instruct";
const processor = await AutoProcessor.from_pretrained(model_id, {
legacy: true, // Use legacy to match python version
});
const model = await AutoModelForCausalLM.from_pretrained(model_id, {
dtype: {
vision_encoder: "q4", // 'q4' or 'q4f16'
prepare_inputs_embeds: "q4", // 'q4' or 'q4f16'
model: "q4f16", // 'q4f16'
},
});
// Load image
const image = await load_image("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/meme.png");
// Prepare inputs
const messages = [
{ role: "user", content: "<|image_1|>What's funny about this image?" },
];
const prompt = processor.tokenizer.apply_chat_template(messages, {
tokenize: false,
add_generation_prompt: true,
});
const inputs = await processor(prompt, image, { num_crops: 4 });
// (Optional) Set up text streamer
const streamer = new TextStreamer(processor.tokenizer, {
skip_prompt: true,
skip_special_tokens: true,
});
// Generate response
const output = await model.generate({
...inputs,
streamer,
max_new_tokens: 256,
});
</script>
</body>
hello! did you find a workaround for this please? i am having the exact same issue with phi 4 mini and qwen 2.5 1.5b, but i don't have any problems loading deepseek or granite
having the same issue on Macbook 32g m1 chip.
I think this is just OOM when the model needs to allocate more than 4GB to load. I encountered the same in #1286.
You can try adding this line to your ort-wasm-simd-threaded.jsep.mjs to check the WebAssembly memory:
{console.log(`[ONNX] [${new Date().toLocaleString('en-GB', { day: '2-digit', month: 'short', year: 'numeric', hour: '2-digit', minute: '2-digit', second: '2-digit', hour12: false }).replace(/,/g, '').replace(/ /g, '/').replace(/(\d{2})\/(\w{3})\/(\d{4})\/(\d{2}):(\d{2}):(\d{2})/, '$1/$2/$3 $4:$5:$6')}] Memory grew by ${pages} pages (${Number(pages/16).toFixed(2)} MB). Current size: ${Number((x.buffer.byteLength+pages)/1024/1024).toFixed(2)} MB.`);return WebAssembly.Memory.prototype.grow.call(this,pages);}
This is reproduceable on both WASM and WebGPU.
For example, when I load whisper-turbo q4 encoder-decoder:
When I load nllb-200-distilled-600M fp32 encoder q8 decoder:
This is very likely OOM as we can deduce from the model size.
Similarly, when I load nllb-200-distilled-600M q8 encoder fp32 decoder:
This should also be OOM but shows a minified error instead.
And the minified error code is pretty much random every time I try to load the same model that is too large, so I cannot learn any information here.