text-generation-webui
text-generation-webui copied to clipboard
Specify which GPU to use
Hey Ooba,
I have a system with 2 GPU's. Even when not using the --auto-devices flag, it still splits the model 50/50 between the two GPU's causing very slow performance.
Is there a way to specify which GPU to use for loading the model?
Bonus question - Is there a way to specify which GPU to use for the LLM, while still keeping the 2nd GPU available for the extensions (in my case BarkTTS)
Thanks!
Thanks yeah I gave that a go by calling it early in server.py and it works as far as limiting everything to one GPU. But I'm guessing it must then limit the whole session to that GPU including any extensions. I wasn't able to get the Bark extension to use the other GPU by setting CUDA_VISIBLE_DEVICES in the extension's script.py.
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.