Zhihan Jiang comments

Results 28 comments of


                                            Zhihan Jiang

Remove backwards compatibility for the submission checker. Update sub…

Thanks Arjun, I think we are okay with the change as long as it doesn't break the behavior of 4.0 and 4.1 existing benchmarks. Have you tested the workloads?

Update format.yml | Improve the github action

@mrmhodak @pgmpablo157321 can we merge this PR? This is blocking the #1884

Retinanet failed to launch on MLPerf Inference v5.0

@arjunsuresh to help^

LlaMa2-70b run_accuracy.sh issue with consolidate_results.py

I believe the consolidate_results.py is not needed if the pickle input file already has all the samples (24576). That script is a by-product of preprocessing that @nv-alicheng uses IIRC.

Enabling vllm in llama2-70b

We didn't use vLLM when creating Llama2-70B - feel free to use any version that works

Add missing interactive configurations

@nv-alicheng to review. @pgmpablo157321 how hard would it be to make interactive a third scenario, but use the same code path as server? If it's too complicated we can live...

Merge LLM interactive scenarios into the benchmark (as a separate server scenario, instead of a separate benchmark)

@pgmpablo157321 I think all the interactive parameters and latencies for llama-405B and 8B are missing in the mlperf.conf: https://github.com/mlcommons/inference/blob/0a3570efb0309b5581f2831d84c05fe5483b5ef7/loadgen/mlperf.conf#L60 Can you help add them?

Merge LLM interactive scenarios into the benchmark (as a separate server scenario, instead of a separate benchmark)

Also in the existing mlperf.conf, the llama2-interactive still seems like a separate benchmark. Not sure if we can change to llama2-70b.interactive.xxx this round: https://github.com/mlcommons/inference/blob/0a3570efb0309b5581f2831d84c05fe5483b5ef7/loadgen/mlperf.conf#L94

llama3: performance_sample_count update

Addressed in https://github.com/mlcommons/inference/pull/1978/files

Llama2 LoadGen server mode: TPS not reported properly

Seems like the result_token_per_second is in the summary.txt. Not sure why it's not in the details.txt