DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?
The mii inferencing benchmark script computes throughput as num_clients/latency. Shouldn't this be num_queries/latency
?
Also why use P95 latency and not the total time it took to process all the requests, for the purposes of computing throughput?