web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

Performance on NVIDIA GPU (discrete) seems to be much worse than AMD (integrated) GPU - is that expected?

Open armsp opened this issue 2 years ago • 2 comments

I have an integrated AMD GPU (512 MB dedicated memory and 11.6GB shared memory) and a discreet NVIDIA GPU (6GB dedicated memory and 11.6 GB shared). Here are the results that were quite unexpected - image

When using AMD, mostly shared memory was used (8.3/11.6 GB) but on NVIDIA it was the dedicated memory(5.7/6 GB). I expected the results to be opposite. I ran Chrome on NVIDIA and Canary on integrated AMD. (it did seem to me that different models were loaded but I do not have the screenshot for that)

armsp avatar May 11 '23 07:05 armsp

I think the models were the same.

AMD

[System Initalize] Initialize GPU device: WebGPU - amd [System Initalize] Fetching param cache[81/163]: 2006MB fetched. 49% completed, 22 secs elapsed. It can take a while when we first visit this page to populate the cache. Later refreshes will become faster. [System Initalize] Loading GPU shader modules[50/54]: 92% completed, 4 secs elapsed.

NVIDIA

[System Initalize] Initialize GPU device: WebGPU - NVIDIA GeForce RTX 3060 Laptop GPU [System Initalize] Fetching param cache[55/163]: 1372MB fetched. 34% completed, 10 secs elapsed. It can take a while when we first visit this page to populate the cache. Later refreshes will become faster. [System Initalize] Loading GPU shader modules[46/54]: 85% completed, 3 secs elapsed.

Performance still seems to be much better with integrated GPU than NVIDIA GPU.

armsp avatar May 11 '23 07:05 armsp

You're being bottlenecked by that 6gb of dedicated GPU memory on the Nvidia laptop GPU, which is constantly swapping data to inference the model, the integrated AMD though doesn't have to swap from its shared memory because all of its memory is shared (really the dedicated gpu memory for it is just reserved shared memory)

Foxlum avatar May 15 '23 23:05 Foxlum

@Foxlum I see, is there any way to somehow make this work faster while using NVIDIA gpu or will this always be a problem as long as the VRAM is less than expected?

armsp avatar May 21 '23 11:05 armsp

Are you sure it's even using the Nvidia card? The WebLLM demo loads the model into the Intel UHD integrated GPUs shared memory on my laptop and processes it there super slowly instead of the 3080 discrete GPU I have.

RickieChang avatar May 26 '23 22:05 RickieChang

@RickieChang Yes, if you look at the screenshots I have shared above you can see that it uses the NVIDIA card in the left image.

armsp avatar May 27 '23 08:05 armsp

this is likely due to vram issue, the latest update comes with a smaller model that should be faster

tqchen avatar Jun 16 '23 15:06 tqchen