inference icon indicating copy to clipboard operation
inference copied to clipboard

Reference implementations of MLPerf™ inference benchmarks

Results 200 inference issues
Sort by recently updated
recently updated
newest added

Hello mlcommons team, I want to run the "Automated command to run the benchmark via MLCommons CM" (from the example: https://github.com/mlcommons/inference/tree/master/language/llama2-70b), but I am getting the following error: ``` /root/mambaforge/bin/python3...

When the Mixtral server latency constraints are not met, the submission checker is breaking with the below error. ``` File "/home/arjun/CM/repos/local/cache/f2ac2b26439f49be/inference/tools/submission/log_parser.py", line 44, in __init__ raise RuntimeError("Encountered invalid line: {:}".format(line))...

We have already enabled TEST01 for SDXL - wasn't mandatory for v4.0 (because the proposal came late), but mandatory for v4.1. https://github.com/mlcommons/inference/pull/1574 NVIDIA has checked internally and SDXL can be...

I installed CM following the guide in https://docs.mlcommons.org/ck/install/ successfully and then refer to https://docs.mlcommons.org/inference/benchmarks/language/bert/ to run the scripts as below: cm run script --tags=run-mlperf,inference,_find-performance,_full \ --model=bert-99 \ --implementation=nvidia \ --framework=tensorrt...

``` WARNING:Mixtral-8x7B-Instruct-v0.1-MAIN:Accuracy run will generate the accuracy logs, but the evaluation of the log is not completed yet ```

Hello mlcommons team, I want to run the "Automated command to run the benchmark via MLCommons CM" (from the example: https://github.com/mlcommons/inference/tree/master/language/llama2-70b), but I do not want to download llama2-70b, since...

The command used to run mlperf inference for resnet50 model on ubuntu with rocm is below: cm run script --tags=run-mlperf,inference \ --model=resnet50 \ --implementation=reference \ --framework=tensorflow \ --category=edge \ --scenario=Offline...

Clarified the steps to follow the prereq step was not clear since it points to an external page

Is there any reason why we have an [accuracy upper limit for LLAMA2 Tokens per sample](https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py#L109) but not for GPT-J? It's good to document this reason for users.