gpt4all "Cpu threads" option in settings have no impact on speed

"Cpu threads" option in settings have no impact on speed

Open PedzacyKapec opened this issue 1 year ago • 1 comments

System Info

The number of CPU threads has no impact on the speed of text generation. Its always 4.4 tokens/sec when using Groovy model according to gpt4all.

ver 2.4.13, win10, CPU: Intel I7 10700 Model tested: Groovy

Information

[ ] The official example notebooks/scripts
[ ] My own modified scripts

Related Components

[ ] backend
[ ] bindings
[ ] python-bindings
[ ] chat-ui
[ ] models
[ ] circleci
[ ] docker
[ ] api

Reproduction

Use 12 threads
Check the tokens speed
Change threads to 2
Check the tokens speed

Expected behavior

SHOuld be difference between 12 threads compared to 2 threads in terms of speed ?

Jul 16 '23 17:07 PedzacyKapec

Hi,

I'm just been testing and I figure out something.

On my old laptop... and increases the speed of the tokens per second going from 1 thread till 4 threads... 5 and 6 threads are kind of the same... 8 threads is almost as slow as 1 thread... maybe comparable with 2 threads but still using the whole CPU and Energy.

It appears to be a problem in the scheduling or something like that.

I can give it a try with my Mac later on but at least for now, I can kind of confirm this... but if anything it is worst than hear described.

This laptop has 36 Gb ram, MX130 4GB video (of which obviusly none is used) and the CPU is a i7-8550U. My OS is Kubuntu 22.04

Apr 27 '24 18:04 scherenhaenden

This is because llama.cpp is bottlenecked by memory bandwidth. So using more CPU cores doesn't help, because the system RAM simply can't keep up. GPUs work better because they have much faster memory.

Jul 17 '24 17:07 cebtenzzre

gpt4all gpt4all copied to clipboard

"Cpu threads" option in settings have no impact on speed

System Info

Information

Related Components

Reproduction

Expected behavior

gpt4all
gpt4all copied to clipboard