DeepSpeedExamples Throughput should be `num_queries/latency` as opposed to `num

Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?

Open goelayu opened this issue 1 year ago • 0 comments

The mii inferencing benchmark script computes throughput as num_clients/latency. Shouldn't this be num_queries/latency?

Also why use P95 latency and not the total time it took to process all the requests, for the purposes of computing throughput?

Feb 03 '24 00:02 goelayu