gpt4all
gpt4all copied to clipboard
[Feature] Support more than 64 CPU threads
Feature Request
currently gpt4all settings allow setting max cpu threads to 64 only
when i set it to 192 (which is my current hw setup) it always reverts to 64
ollama by itself supports 192 - i tried that so maybe this is just some UI restriction? didn't look in the code
thanks
I'm not from Nomic, but I have to ask what is the benefit of even 64 CPU threads? Have you benchmarked 64 threads vs 32, 16, or even 8, and found that higher (after a certain point) are better? My understanding, and shown in my own tests, is that after 6-8 cpu threads, the memory bus is saturated, and more threads tend to do nothing. Maybe you can do a few more than that on Epyc, just don't otherwise expect much more than that to actually accomplish anything. If I am wrong, would appreciate learning from tests you have done.
Looks like code is here: https://github.com/nomic-ai/gpt4all/blob/56d5a230014d294bf5a05ffe27afb447e7c40449/gpt4all-chat/mysettings.cpp#L434-L443
Which means, the thread count is determined by what Qt thinks should be the upper limit. I'm unsure whether you'd get more performance out of it with a higher value, in any case.
Is it caused due to this? https://stackoverflow.com/questions/46314471/qthreadidealthreadcount-returns-the-wrong-answer-how-to-solve-it
Is it caused due to this?
Maybe, although I don't know about Qt internals and that Q&A is really old. But anyway, as chrisbarrera said, I'm not even sure it would help any to go past Qt defined limit.
llama.cpp on CPU is memory-bottlenecked in practice, so using more CPU threads doesn't provide much benefit. The default of 4 threads is enough on my machine. Try with ollama or the llama.cpp CLI and see if you actually get any t/s improvement compared to 64 threads—you may actually see a slowdown.
The speed does not scale linearly with the number of CPU threads. We have chosen 4 because generation speed seems to plateau at that point. There is no need to use more.