Merge LLM interactive scenarios into the benchmark (as a separate server scenario, instead of a separate benchmark)
As titled. The required changes might be:
- LoadGen to support server scenarios with more than 1 set of latency thresholds (TTFT/TPOT)
- User to select the latency scenario based on a flag (userSettings probably)?
- Accuracy checker to validate the result with the correct threshold
- Result table to distinguish the 2 server scenarios
@pgmpablo157321
@pgmpablo157321 Could you take a look and help figure out whether these changes can be implemented before the v5.1 code freeze before next Tuesday?
@nvzhihanj @hanyunfan I opened a PR with the changes needed for this. No LoadGen changes were needed for this change, we are just adding a scenario in the submission checker and making sure the run for this scenario is a server run with the correct latencies. The submission will look something like this:
- results
- llama2-70b
- Interactive (optional)
- accuracy
- performance
- run1
- mlperf_log_accuracy.json
- mlperf_log_detail.txt (server run with interactive latencies)
- mlperf_log_summary.txt (server run with interactive latencies)
- run1
- Server
- Offline
- Interactive (optional)
- llama2-70b
Pablo will run more tests on it
NVIDIA’s proposal on 6/17: Choice 1: Datacenter submission has to have Offline + at least one of (Server and Interactive) Choice 2: Datacenter submission has to have Offline and Server (and Interactive is optional extra) MLCommon, prefer Choice1, give 1 week for everyone to think about it.
@pgmpablo157321 I think all the interactive parameters and latencies for llama-405B and 8B are missing in the mlperf.conf: https://github.com/mlcommons/inference/blob/0a3570efb0309b5581f2831d84c05fe5483b5ef7/loadgen/mlperf.conf#L60
Can you help add them?
Also in the existing mlperf.conf, the llama2-interactive still seems like a separate benchmark. Not sure if we can change to llama2-70b.interactive.xxx this round: https://github.com/mlcommons/inference/blob/0a3570efb0309b5581f2831d84c05fe5483b5ef7/loadgen/mlperf.conf#L94
Linking this thread to the discussion in PR #2224