gemma.cpp icon indicating copy to clipboard operation
gemma.cpp copied to clipboard

num_threads doesn't seem to have any effect

Open windmaple opened this issue 1 year ago • 5 comments

./gemma
--tokenizer tokenizer.spm
--compressed_weights 2b-it-sfp.sbs
--model 2b-it
--verbosity 2
--num_threads 2

Increasing num_threads like this doesn't improve speed. Is this expected?

windmaple avatar Feb 25 '24 01:02 windmaple

Just to confirm, do you have backslashes in that command line so that all are indeed passed in? Does the binary print num_threads : 2? Even two threads should help, but it depends on the platform. Maybe one core is already enough to saturate memory bandwidth?

jan-wassenberg avatar Feb 25 '24 10:02 jan-wassenberg

threads

Yes, pretty sure gemma.cpp accepted the param. See the screenshot (was using 6 this time).

You are probably right; it may have hit the memory bottleneck even w/ 1 thread. Not sure how to check though.

Btw, this runs on an Android phone.

windmaple avatar Feb 25 '24 13:02 windmaple

it won't always be monotonically increasing with # threads, can be quite system dependent so takes a bit of experimentation. You might want to try 2b-it-sfp which should be faster in general and may be less mem bandwidth bound.

Neat to hear it's running on an android phone! what model?

austinvhuang avatar Feb 25 '24 14:02 austinvhuang

Running on Xiaomi 14

windmaple avatar Feb 26 '24 00:02 windmaple

Good, so it's getting the argument value correctly. You can run STREAM to benchmark bandwidth, it also supports threading.

+1 to the SFP suggestion.

jan-wassenberg avatar Feb 26 '24 01:02 jan-wassenberg

Closing for now, if there's anything that's not addressed above, feel free to chime in. Also added a small note to the README "What are some easy ways to make the model run faster?" here https://github.com/google/gemma.cpp?tab=readme-ov-file#troubleshooting-and-faqs

austinvhuang avatar Feb 29 '24 14:02 austinvhuang