XNNPACK icon indicating copy to clipboard operation
XNNPACK copied to clipboard

Why does the benchmark program run with only one thread?

Open mszhanyi opened this issue 1 year ago • 1 comments

After building, I tried running some benchmark program, such as, f16-gemm-ben. Why does it run with only one thread? Is there any options to make it run with multiple threads?

mszhanyi avatar Dec 04 '24 04:12 mszhanyi

You can use models/benchmark with --num-threads=2 etc

bazel build -c opt :bench/models:benchmark

models/benchmark --benchmark_filter=V2


Benchmark Time CPU Iterations UserCounters...

FP32MobileNetV2/real_time 5327 us 5326 us 127 cpufreq=3.32637G FP16MobileNetV2/real_time 16901 us 16901 us 40 cpufreq=3.45263G QS8MobileNetV2/real_time 7883 us 7881 us 83 cpufreq=3.28256G

models/benchmark --benchmark_filter=V2 --num_threads=2

Benchmark Time CPU Iterations UserCounters...

FP32MobileNetV2/real_time 3226 us 3226 us 217 cpufreq=3.35078G FP16MobileNetV2/real_time 9327 us 9315 us 71 cpufreq=3.48561G QS8MobileNetV2/real_time 4827 us 4827 us 136 cpufreq=3.27259G

If you run perf record/report or watch top, you'll see multiple threads run

There is also a TFLite benchmark_model in the tflite github, which will give similar result, and can work with a variety of tflite models, as it runs on top of xnnpack tensorflow/lite/tools/benchmark:benchmark_model

benchmark_model --graph=mobilenet_v2_1.00_224_int8.tflite --num_threads=1 --num_runs=5 INFO: Inference timings in us: Init: 18194, First inference: 9283, Warmup (avg): 8374.68, Inference (avg): 7927.9

benchmark_model --graph=mobilenet_v2_1.00_224_int8.tflite --num_threads=2 --num_runs=5 INFO: Inference timings in us: Init: 17629, First inference: 7933, Warmup (avg): 4846.4, Inference (avg): 4670.13

fbarchard avatar Jan 23 '25 00:01 fbarchard