DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?

Open goelayu opened this issue 5 months ago • 0 comments

The mii inferencing benchmark script computes throughput as num_clients/latency. Shouldn't this be num_queries/latency?

Also why use P95 latency and not the total time it took to process all the requests, for the purposes of computing throughput?

goelayu avatar Feb 03 '24 00:02 goelayu