gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

[Feature] Support more than 64 CPU threads

Open fixerivan opened this issue 1 year ago • 5 comments

Feature Request

currently gpt4all settings allow setting max cpu threads to 64 only

when i set it to 192 (which is my current hw setup) it always reverts to 64

ollama by itself supports 192 - i tried that so maybe this is just some UI restriction? didn't look in the code

thanks

fixerivan avatar Jul 19 '24 10:07 fixerivan

I'm not from Nomic, but I have to ask what is the benefit of even 64 CPU threads? Have you benchmarked 64 threads vs 32, 16, or even 8, and found that higher (after a certain point) are better? My understanding, and shown in my own tests, is that after 6-8 cpu threads, the memory bus is saturated, and more threads tend to do nothing. Maybe you can do a few more than that on Epyc, just don't otherwise expect much more than that to actually accomplish anything. If I am wrong, would appreciate learning from tests you have done.

chrisbarrera avatar Jul 19 '24 14:07 chrisbarrera

Looks like code is here: https://github.com/nomic-ai/gpt4all/blob/56d5a230014d294bf5a05ffe27afb447e7c40449/gpt4all-chat/mysettings.cpp#L434-L443

Which means, the thread count is determined by what Qt thinks should be the upper limit. I'm unsure whether you'd get more performance out of it with a higher value, in any case.

cosmic-snow avatar Jul 19 '24 15:07 cosmic-snow

Is it caused due to this? https://stackoverflow.com/questions/46314471/qthreadidealthreadcount-returns-the-wrong-answer-how-to-solve-it

supersonictw avatar Jul 19 '24 20:07 supersonictw

Is it caused due to this?

Maybe, although I don't know about Qt internals and that Q&A is really old. But anyway, as chrisbarrera said, I'm not even sure it would help any to go past Qt defined limit.

cosmic-snow avatar Jul 19 '24 23:07 cosmic-snow

llama.cpp on CPU is memory-bottlenecked in practice, so using more CPU threads doesn't provide much benefit. The default of 4 threads is enough on my machine. Try with ollama or the llama.cpp CLI and see if you actually get any t/s improvement compared to 64 threads—you may actually see a slowdown.

cebtenzzre avatar Jul 29 '24 16:07 cebtenzzre

The speed does not scale linearly with the number of CPU threads. We have chosen 4 because generation speed seems to plateau at that point. There is no need to use more.

cebtenzzre avatar Feb 12 '25 19:02 cebtenzzre