text-generation-webui
text-generation-webui copied to clipboard
CPU mode only increase core count
I am trying to get llama running on this in CPU mode (24 core epyc). Running with
python3 server.py --model llama-13b --load-in-8bit --no-stream --cpu
single response takes about 300 seconds for 200 tokens
However in this setup, it only uses a single CPU core and I don't see any arguments to use to increase this value. Can this be increased in any other way or I am hitting the limitations of python?
Try removing --load-in-8bit
, this is not meant to be used in CPU mode.
Unfortunately removing this param doesn't change anything. It still pegs just one cpu core Before Output generated in 317.38 seconds (0.63 tokens/s, 200 tokens) Output generated in 328.47 seconds (0.61 tokens/s, 200 tokens) Output generated in 314.88 seconds (0.64 tokens/s, 200 tokens)
After Output generated in 316.73 seconds (0.63 tokens/s, 200 tokens)
Anything else I can try?
It's weird that it's using just 1 core. Last year I used to use CPU mode a lot, and what I noted is that pytorch only uses 50% of the CPU cores. I have just made a test with llama-7b and the behavior was the same.
There is a workaround to force usage of all CPU cores, but it didn't lead to any improvement in performance. If you search through the past issues you can probably find something about that.
What is your OS?
Nothing exotic - Debian 11.6
I did try adding the torch.set_num_threads(24)
to modules/text_generation.py
from https://github.com/oobabooga/text-generation-webui/issues/8
but I am still seeing the same thing
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.
has there been any updates ? I am running into the same issue using Arch. I have a Server with 32 cores but just 12 cores beeing used with the model deepseek-coder-6.7b-instruct.Q4_K_M.gguf
i am using ollama instead, which does do a much better job at CPU utilization, however sometimes it gets stuck and never produces an output, so it's a tradeoff