llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Not an issue but what depends on the number of threads?

Open alexcardo opened this issue 1 year ago • 2 comments

I've been testing your code from 1 to 8 threads and the output is always different. The speed is not depend on the number of threads. On the contrary, 4 threads may perform much better than 1, whereas 8 threads supposedly provides a better result. However, the same prompt may give the same excellent output with triple speed with 4 threads compared to 8. But still, when I use 8 threads (my maximum on M1) I use all my CPU resources, but it doesn't affect speed at all (seemingly works slower) and not giving quality effect (apparently). Am I wrong? Can you correct me if I'm mistaken? May be there is some best speed/quality option and I just that stupid that was unable to figure out how to use this option?

alexcardo avatar Mar 15 '23 16:03 alexcardo

The code is memory bound somewhere between 8 and 16 threads on my 16 core system. I suspect your system has 4 cores / 8 hyperthreads. Hyperthreading isn't helping your performance.

The output may subtly change with different numbers of threads due to the multithreading architecture of the code, but the average quality shouldn't.

gjmulder avatar Mar 15 '23 16:03 gjmulder

M1 definitely has 8 physical cores (and I believe it has fairly high memory bandwidth but may be wrong). It could have something to do with 4 of those cores being lower-performance efficiency cores, but spreading the workload across more cores should still improve performance.

j-f1 avatar Mar 15 '23 17:03 j-f1

Going from 4 to 7-8 helps, but only marginally. Maybe if they were pinned..

namliz avatar Mar 15 '23 18:03 namliz