gpt4all
gpt4all copied to clipboard
"Cpu threads" option in settings have no impact on speed
System Info
The number of CPU threads has no impact on the speed of text generation. Its always 4.4 tokens/sec when using Groovy model according to gpt4all.
ver 2.4.13, win10, CPU: Intel I7 10700 Model tested: Groovy
Information
- [ ] The official example notebooks/scripts
- [ ] My own modified scripts
Related Components
- [ ] backend
- [ ] bindings
- [ ] python-bindings
- [ ] chat-ui
- [ ] models
- [ ] circleci
- [ ] docker
- [ ] api
Reproduction
- Use 12 threads
- Check the tokens speed
- Change threads to 2
- Check the tokens speed
Expected behavior
SHOuld be difference between 12 threads compared to 2 threads in terms of speed ?
Hi,
I'm just been testing and I figure out something.
On my old laptop... and increases the speed of the tokens per second going from 1 thread till 4 threads... 5 and 6 threads are kind of the same... 8 threads is almost as slow as 1 thread... maybe comparable with 2 threads but still using the whole CPU and Energy.
It appears to be a problem in the scheduling or something like that.
I can give it a try with my Mac later on but at least for now, I can kind of confirm this... but if anything it is worst than hear described.
This laptop has 36 Gb ram, MX130 4GB video (of which obviusly none is used) and the CPU is a i7-8550U. My OS is Kubuntu 22.04
This is because llama.cpp is bottlenecked by memory bandwidth. So using more CPU cores doesn't help, because the system RAM simply can't keep up. GPUs work better because they have much faster memory.