private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

Windows: GPU is not taxed enough compared to other apps/frameworks

Open ramz35 opened this issue 11 months ago • 2 comments

I have a cuda compiled llama-cpp-python and on start up I can see gpu correctly identified and all 33 layers offloaded onto GPU. I can see the model fits into my VRAM. If I run the default model or one of my local ones (eg. TheBloke-LLama2-Chat-7B) the GPU doesn't go past say 38% while the CPU is max 20%. In other apps/frameworks eg. h2ogpt, llama.cpp the GPU is maxed out while the CPU is similar or lower.

Just curious if this is a bug or a non-optimal version of the code or if working as designed?

edit: there was another issue (closed) that mentioned an untaxed GPU but with CPU at full pelt but that apparently was fixed with version 0.3.0 which I have, and it's not exactly the same as mine as my CPU isn't taxed.

ramz35 avatar Mar 06 '24 16:03 ramz35

I have similar results with Mistral 7B.

Zirgite avatar Mar 09 '24 11:03 Zirgite

+1 on ubuntu

paul-asvb avatar Mar 13 '24 10:03 paul-asvb