inference
inference copied to clipboard
Interactive: still occasionally a separate workload rather than a fully fledged scenario?
It's confusing that mlperf.conf still uses separate workloads and the Server scenario for the Interactive case.
grep -Ri interactive inference/loadgen/*
docs/src/doxygen.cfg:# enable generation of interactive SVG images that allow zooming and panning. docs/src/doxygen.cfg:INTERACTIVE_SVG = YES mlperf.conf:llama2-70b-interactive.*.performance_sample_count_override = 24576 mlperf.conf:llama3_1-405b-interactive.*.performance_sample_count_override = 8313 mlperf.conf:llama3_1-8b-interactive.*.performance_sample_count_override = 13368 mlperf.conf:llama2-70b-interactive.*.sample_concatenate_permutation = 1 mlperf.conf:llama3_1-405b-interactive.*.sample_concatenate_permutation = 1 mlperf.conf:llama3_1-8b-interactive.*.sample_concatenate_permutation = 1 mlperf.conf:llama2-70b-interactive.*.use_token_latencies = 1 mlperf.conf:llama3_1-405b-interactive.*.use_token_latencies = 1 mlperf.conf:llama3_1-8b-interactive.*.use_token_latencies = 1 mlperf.conf:# Target Latencies for interactive setting mlperf.conf:llama2-70b-interactive.Server.target_latency = 0 mlperf.conf:llama2-70b-interactive.Server.ttft_latency = 450 mlperf.conf:llama2-70b-interactive.Server.tpot_latency = 40 mlperf.conf:# Target Latencies for interactive setting mlperf.conf:llama3_1-405b-interactive.Server.target_latency = 0 mlperf.conf:llama3_1-405b-interactive.Server.ttft_latency = 4500 mlperf.conf:llama3_1-405b-interactive.Server.tpot_latency = 80 mlperf.conf:# Target Latencies for interactive setting mlperf.conf:llama3_1-8b-interactive.Server.target_latency = 0 mlperf.conf:llama3_1-8b-interactive.Server.ttft_latency = 500 mlperf.conf:llama3_1-8b-interactive.Server.tpot_latency = 30
Still, I assume that user.conf for Interactive is expected to look like:
*.Server.target_qps = <target QPS lower than for Server to meet more stringent latency constraints>
*.Server.min_duration = <minimum duration in milliseconds; at least 600,000>
(* here should cover both llama2-70b-interactive and llama2-70b, whichever is correct.)
I'm not sure whether PR #2281 was required. For us, just fixing the missing comma as in PR #2283 seemed sufficient.
@pgmpablo157321 Could you take a look