inference
inference copied to clipboard
Reintroducing min_query_count for SingleStream (64) and Server (662)
The minimum query count (min_query_count) was removed from mlperf.conf a while ago. I believe the thinking was that submitters could choose how many samples to process. As long as the minimum run duration (min_duration) constraint of 10 minutes was met, early stopping would take care of estimating the 90th percentile for SingleStream, and the 99th percentile for MultiStream and Server.
However, for SingleStream early stopping still requires at least 64 samples to estimate the 90th percentile. Similarly, for MultiStream early stopping requires at least 662 samples to estimate the 99th percentile. So trying to process less than 64 samples for SingleStream and 662 queries for Server will result in INVALID runs.
Perhaps it would be better to reintroduce these constraints to match the one for MultiStream:
*.MultiStream.min_query_count = 662