web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

WebLLM always processes on Intel UHD Graphics, not on NVIDIA T1200

Open b521f771d8991e6f1d8e65ae05a8d783 opened this issue 1 year ago • 11 comments

Hi!

I could not find any other forum to post this, so I will write it here: I am trying to use WebLLM via a Chromium-based Browser (I am developing an Add-In for Outlook, which uses Blink WebView under the hood). So far, the Web-LLM works, but it always processes on my Graphic Chip and leaves my GPU untouched. How could this behaviour be configured?

Thank you so far for your effort!

I have the exact same question.

ReneLH avatar Oct 24 '24 11:10 ReneLH

would be great if you can check https://webgpureport.org/ and send a screen shot, it may have to do with how we order adapters

tqchen avatar Oct 24 '24 13:10 tqchen

You can force Chrome in windows to use the more powerful GPU by going to the Display>Graphics>Apps page, adding chrome, clicking options, and setting to use dedicated GPU.

Not an ideal outcome but how it works right now

Iternal-JBH4 avatar Oct 24 '24 15:10 Iternal-JBH4

Also ran into this on Windows 10 and 11, Chrome, Edge, and Brave all only give the low power gpu, even if you request with the high-performance powerPreference. You can verify this from the js console by opening DevTools and running

const adapter = await navigator.gpu.requestAdapter({powerPreference: 'high-performance'})

and inspecting the result.

Note: The powerPreference IS honored on Mac OSX

StevenHanbyWilliams avatar Oct 30 '24 22:10 StevenHanbyWilliams

I have a hybrid Intel + Nvidia GPU, although the nvidia is way more powerful I tend to use the intel one to make compatibility checks mainly for mobile devices.

With that in mind, I ask if the function that web-llm calls here: https://github.com/mlc-ai/web-llm/blob/082f04e4941ff4f6ef70731d244c69228948c7a1/src/service_worker.ts#L117 is this one linked below? https://github.com/apache/tvm/blob/7ae7ea836169d3cf28b05c7d0dd2cb6a2045508e/web/src/webgpu.ts#L36

I'm asking because, even that it should default to the higher performance one, it would be nice to have the option to use the low-power GPU when requested. What would be the direction here, should I open a PR on the apache/tvm repo?

marschr avatar Nov 27 '24 04:11 marschr

I have a hybrid Intel + Nvidia GPU, although the nvidia is way more powerful I tend to use the intel one to make compatibility checks mainly for mobile devices.

With that in mind, I ask if the function that web-llm calls here:

https://github.com/mlc-ai/web-llm/blob/082f04e4941ff4f6ef70731d244c69228948c7a1/src/service_worker.ts#L117

is this one linked below? https://github.com/apache/tvm/blob/7ae7ea836169d3cf28b05c7d0dd2cb6a2045508e/web/src/webgpu.ts#L36 I'm asking because, even that it should default to the higher performance one, it would be nice to have the option to use the low-power GPU when requested. What would be the direction here, should I open a PR on the apache/tvm repo?

The current issue is that webllm is unable to use NVIDIA graphics on hybrid Intel + Nvidia GPU devices. So you don't need to worry about using NVIDIA graphics by default for now.

const adapter = await navigator.gpu.requestAdapter({powerPreference: 'high-performance'})

This code does not work as it should on a dual gpu device.

I think the way forward should be to first make high performance gpu's available on webgpu's and then provide parameters to choose which device to use. As for the default device, I support the use of integrated graphics, but more discussion is still needed.

zhibisora avatar Nov 29 '24 16:11 zhibisora

I have the same issue on an R7000p laptop.

Some relevant information can be found here. https://developer.mozilla.org/en-US/docs/Web/API/GPU/requestAdapter

zhibisora avatar Nov 29 '24 17:11 zhibisora

The current issue is that webllm is unable to use NVIDIA graphics on hybrid Intel + Nvidia GPU devices. So you don't need to worry about using NVIDIA graphics by default for now.

const adapter = await navigator.gpu.requestAdapter({powerPreference: 'high-performance'})

This code does not work as it should on a dual gpu device.

I think the way forward should be to first make high performance gpu's available on webgpu's and then provide parameters to choose which device to use. As for the default device, I support the use of integrated graphics, but more discussion is still needed.

Thanks for the reply!

This is odd, both https://webgpureport.org/ (check below) and https://github.com/huggingface/transformers.js see both Intel GPU and the nVidia GPU.

for context, huggingface/transformers.js seems to use onnx under the hood to support webgpu, and there's a flag that overrides the use of one or the other, something like env.backends.onnx.webgpu.powerPreference = 'low-power' or 'high-performance'.

I've switched to web-llm because currently seems to offer better performance and model support.

I also support that the integrated graphics - or even an NPU/NeuralEngine/etc in the future, if something like WebNN becomes a reality - and leave to the user/implementation to notify the user to switch to another more powerful hardware, but for now, a flag or an argument to select the GPU would be just fine.

I've attached the screenshot below from webgpureport.org on my system for further inspection: Screenshot from 2024-12-04 01-18-05

I did open a PR to address device selection on tvm's repo, but I'm not feeling like it's going to get merged anytime soon https://github.com/apache/tvm/pull/17545/files.

marschr avatar Dec 04 '24 04:12 marschr

After half an hour of googling around I came to this --use-webgpu-power-preference=force-low-power command line flag from this chromium source at about line 60. This way you can force your chromium based browser to use the low-power GPU, in my case the Intel one. My nVidia still shows on chrome://gpu but the sites cannot see it anymore. The integrated intel GPU spikes to 100% use when running the web-llm inference (check on intel_gpu_top) while the RTX4060 remained idling.

marschr avatar Dec 04 '24 06:12 marschr

I have the same setup, Intel + NVIDIA, and only the CPU is used by the application. Here are the WebGPU and chrome://gpu reports in case it helps with the support to Hybrid setups.

webgpureport-2025-07-19T16-09-31-990Z.txt about-gpu-2025-07-19T16-09-27-847Z.txt

rvernica avatar Jul 19 '25 16:07 rvernica

Actually, I think it works as intended, I started chrome with google-chrome --enable-features=Vulkan as instructed here, and now the GPU is used as expected. Here is the updated chrome://gpu report. Notice the presence of Vulkan: Enabled

about-gpu-2025-07-19T16-36-31-219Z.txt

rvernica avatar Jul 19 '25 16:07 rvernica