inference Setting `min_query

Setting `min_query_count` for GPTJ

Open psyhtest opened this issue 1 year ago • 13 comments

Running GPTJ even on accelerated systems can be quite demanding, as the Server latency constraint of 20 seconds suggests. For systems close to this threshold, meeting the minimum run duration of 10 minutes would require processing just over 30 samples.

However, when trying to set min_query_count in user.conf (or indeed in mlperf.conf proper) e.g.:

gptj.SingleStream.min_query_count = 100
gptj.SingleStream.max_query_count = 100
gptj.SingleStream.performance_sample_count_override = 13368
gptj.SingleStream.target_latency = 19000

I still see in mlperf_log_summary.txt:

min_query_count : 13368
max_query_count : 100

with the following experiment summary:

================================================
MLPerf Results Summary
================================================
SUT name : KILT_SERVER
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : xxxxxxxxxx
Result is : INVALID
  Min duration satisfied : Yes
  Min queries satisfied : NO
  Early stopping satisfied: Yes
Recommendations:
 * The test exited early, before enough queries were issued.
   See the detailed log for why this may have occurred.
Early Stopping Result:
 * Processed at least 64 queries (100).
 * Would discard 2 highest latency queries.
 * Early stopping 90th percentile estimate: yyyyyyyyyy
 * Not enough queries processed for 99th percentile
 early stopping estimate (would need to process at
 least 662 total queries).

Is there any reason why LoadGen enforces this? I know that we agreed that the minimum number of queries for Offline should cover the whole dataset, e.g. min_query_count == performance_sample_count_override == 13368 for GPTJ. It may be OK for Offline and Server, but for GPTJ SingleStream at 20 seconds per sample we would be looking at over 3 days (and double that for a power run!)

@mrmhodak @pgmpablo157321

Feb 20 '24 00:02 psyhtest

inference inference copied to clipboard

Setting `min_query_count` for GPTJ

inference
inference copied to clipboard