gemma.cpp num_threads doesn't seem to have any effect

num_threads doesn't seem to have any effect

Open windmaple opened this issue 1 year ago • 5 comments

./gemma
--tokenizer tokenizer.spm
--compressed_weights 2b-it-sfp.sbs
--model 2b-it
--verbosity 2
--num_threads 2

Increasing num_threads like this doesn't improve speed. Is this expected?

Feb 25 '24 01:02 windmaple

Just to confirm, do you have backslashes in that command line so that all are indeed passed in? Does the binary print num_threads : 2? Even two threads should help, but it depends on the platform. Maybe one core is already enough to saturate memory bandwidth?

Feb 25 '24 10:02 jan-wassenberg

threads

Yes, pretty sure gemma.cpp accepted the param. See the screenshot (was using 6 this time).

You are probably right; it may have hit the memory bottleneck even w/ 1 thread. Not sure how to check though.

Btw, this runs on an Android phone.

Feb 25 '24 13:02 windmaple

it won't always be monotonically increasing with # threads, can be quite system dependent so takes a bit of experimentation. You might want to try 2b-it-sfp which should be faster in general and may be less mem bandwidth bound.

Neat to hear it's running on an android phone! what model?

Feb 25 '24 14:02 austinvhuang

Running on Xiaomi 14

Feb 26 '24 00:02 windmaple

Good, so it's getting the argument value correctly. You can run STREAM to benchmark bandwidth, it also supports threading.

+1 to the SFP suggestion.

Feb 26 '24 01:02 jan-wassenberg

Closing for now, if there's anything that's not addressed above, feel free to chime in. Also added a small note to the README "What are some easy ways to make the model run faster?" here https://github.com/google/gemma.cpp?tab=readme-ov-file#troubleshooting-and-faqs

Feb 29 '24 14:02 austinvhuang

gemma.cpp gemma.cpp copied to clipboard

num_threads doesn't seem to have any effect

gemma.cpp
gemma.cpp copied to clipboard