XNNPACK
XNNPACK copied to clipboard
How can I parallelize the execution of this benchmark? (https://github.com/google/XNNPACK/blob/master/bench/spmm-benchmark.h)
The end2end_bench shows spmm on arm using threads.
The end2end benchmarks benchmark a full model, or? Can I just benchmark individual gemms? Do you have an example in mind which shows how to benchmark individual gemms with different number of threads?