inference icon indicating copy to clipboard operation
inference copied to clipboard

Reference implementations of MLPerf™ inference benchmarks

Results 200 inference issues
Sort by recently updated
recently updated
newest added

@attafosu @pgmpablo157321 please review and merge this.

In v4.0 submission, we found in the **server** log that "result_token_throughput" is not reported properly, and most of them are at the e-09 scale (@pgmpablo157321 feel free to to check...

As presented in https://docs.google.com/presentation/d/1Y_AKEJ6h1g5k3ntrL7nTazWw3xVDzJ_tjOGkLQ6VDMI/edit?usp=sharing the completed sample per second is a better representation of the throughput than scheduled QPS. @pgmpablo157321 to help implement after the conclusion of v4.0

[compliance_checker_log.txt](https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#563-inference) inside the results directory is mentioned as a requirement by the submission rules but is not enforced by the submission UI.

Hi @arjunsuresh and @gfursin, I facing errors in the run benchmark section in the text-to-image section. ``` user@AIMLPerf-NVMe:~/CM/repos/local/cache/57064143a0ce4ff2/inference/text_to_image/model$ cd $SD_FOLDER user@AIMLPerf-NVMe:~/CM/repos/local/cache/57064143a0ce4ff2/inference/text_to_image$ python3 main.py --dataset "coco-1024" --dataset-path coco2014 --profile stable-diffusion-xl-pytorch --model-path...

For llama benchmarks, the submission checker uses tokens per second for Offline, but samples per second for Server. https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py#L1385 However, the summary.csv still [use](https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py#L2543-L2544) samples/second as the header to report...

There was a discussion on how to make Early Stopping more user friendly in https://github.com/mlcommons/inference/issues/1095 This issue was closed without being added into real policy and implementation though. And in...

**Command:** cmr "run mlperf inference generate-run-cmds _submission" --quiet --submitter="MLCommons" --hw_name=default --model=bert-99 --implementation=reference --backend=pytorch --device=cuda --scenario=Offline --adr.compiler.tags=gcc --target_qps=1 --category=edge --division=open --env.CM_VERIFY_SSL=false **OS Version:** Ubuntu 22.04 with kernel 6.5.0 **CUDA Version:** 12.0...

It'll be good to fix the compilation warnings happening for loadgen. ``` -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting...