WebLLM always processes on Intel UHD Graphics, not on NVIDIA T1200
Hi!
I could not find any other forum to post this, so I will write it here: I am trying to use WebLLM via a Chromium-based Browser (I am developing an Add-In for Outlook, which uses Blink WebView under the hood). So far, the Web-LLM works, but it always processes on my Graphic Chip and leaves my GPU untouched. How could this behaviour be configured?
Thank you so far for your effort!
I have the exact same question.
would be great if you can check https://webgpureport.org/ and send a screen shot, it may have to do with how we order adapters
You can force Chrome in windows to use the more powerful GPU by going to the Display>Graphics>Apps page, adding chrome, clicking options, and setting to use dedicated GPU.
Not an ideal outcome but how it works right now
Also ran into this on Windows 10 and 11, Chrome, Edge, and Brave all only give the low power gpu, even if you request with the high-performance powerPreference. You can verify this from the js console by opening DevTools and running
const adapter = await navigator.gpu.requestAdapter({powerPreference: 'high-performance'})
and inspecting the result.
Note: The powerPreference IS honored on Mac OSX
I have a hybrid Intel + Nvidia GPU, although the nvidia is way more powerful I tend to use the intel one to make compatibility checks mainly for mobile devices.
With that in mind, I ask if the function that web-llm calls here: https://github.com/mlc-ai/web-llm/blob/082f04e4941ff4f6ef70731d244c69228948c7a1/src/service_worker.ts#L117 is this one linked below? https://github.com/apache/tvm/blob/7ae7ea836169d3cf28b05c7d0dd2cb6a2045508e/web/src/webgpu.ts#L36
I'm asking because, even that it should default to the higher performance one, it would be nice to have the option to use the low-power GPU when requested.
What would be the direction here, should I open a PR on the apache/tvm repo?
I have a hybrid Intel + Nvidia GPU, although the nvidia is way more powerful I tend to use the intel one to make compatibility checks mainly for mobile devices.
With that in mind, I ask if the function that web-llm calls here:
https://github.com/mlc-ai/web-llm/blob/082f04e4941ff4f6ef70731d244c69228948c7a1/src/service_worker.ts#L117
is this one linked below? https://github.com/apache/tvm/blob/7ae7ea836169d3cf28b05c7d0dd2cb6a2045508e/web/src/webgpu.ts#L36 I'm asking because, even that it should default to the higher performance one, it would be nice to have the option to use the
low-powerGPU when requested. What would be the direction here, should I open a PR on theapache/tvmrepo?
The current issue is that webllm is unable to use NVIDIA graphics on hybrid Intel + Nvidia GPU devices. So you don't need to worry about using NVIDIA graphics by default for now.
const adapter = await navigator.gpu.requestAdapter({powerPreference: 'high-performance'})
This code does not work as it should on a dual gpu device.
I think the way forward should be to first make high performance gpu's available on webgpu's and then provide parameters to choose which device to use. As for the default device, I support the use of integrated graphics, but more discussion is still needed.
I have the same issue on an R7000p laptop.
Some relevant information can be found here. https://developer.mozilla.org/en-US/docs/Web/API/GPU/requestAdapter
The current issue is that webllm is unable to use NVIDIA graphics on hybrid Intel + Nvidia GPU devices. So you don't need to worry about using NVIDIA graphics by default for now.
const adapter = await navigator.gpu.requestAdapter({powerPreference: 'high-performance'})This code does not work as it should on a dual gpu device.
I think the way forward should be to first make high performance gpu's available on webgpu's and then provide parameters to choose which device to use. As for the default device, I support the use of integrated graphics, but more discussion is still needed.
Thanks for the reply!
This is odd, both https://webgpureport.org/ (check below) and https://github.com/huggingface/transformers.js see both Intel GPU and the nVidia GPU.
for context, huggingface/transformers.js seems to use onnx under the hood to support webgpu, and there's a flag that overrides the use of one or the other, something like env.backends.onnx.webgpu.powerPreference = 'low-power' or 'high-performance'.
I've switched to web-llm because currently seems to offer better performance and model support.
I also support that the integrated graphics - or even an NPU/NeuralEngine/etc in the future, if something like WebNN becomes a reality - and leave to the user/implementation to notify the user to switch to another more powerful hardware, but for now, a flag or an argument to select the GPU would be just fine.
I've attached the screenshot below from webgpureport.org on my system for further inspection:
I did open a PR to address device selection on tvm's repo, but I'm not feeling like it's going to get merged anytime soon https://github.com/apache/tvm/pull/17545/files.
After half an hour of googling around I came to this --use-webgpu-power-preference=force-low-power command line flag from this chromium source at about line 60.
This way you can force your chromium based browser to use the low-power GPU, in my case the Intel one.
My nVidia still shows on chrome://gpu but the sites cannot see it anymore. The integrated intel GPU spikes to 100% use when running the web-llm inference (check on intel_gpu_top) while the RTX4060 remained idling.
I have the same setup, Intel + NVIDIA, and only the CPU is used by the application. Here are the WebGPU and chrome://gpu reports in case it helps with the support to Hybrid setups.
webgpureport-2025-07-19T16-09-31-990Z.txt about-gpu-2025-07-19T16-09-27-847Z.txt
Actually, I think it works as intended, I started chrome with google-chrome --enable-features=Vulkan as instructed here, and now the GPU is used as expected. Here is the updated chrome://gpu report. Notice the presence of Vulkan: Enabled