inference
inference copied to clipboard
Reference implementations of MLPerf™ inference benchmarks
Hello mlcommons team, I want to run the "Automated command to run the benchmark via MLCommons CM" (from the example: https://github.com/mlcommons/inference/tree/master/language/llama2-70b), but I am getting the following error: ``` /root/mambaforge/bin/python3...
When the Mixtral server latency constraints are not met, the submission checker is breaking with the below error. ``` File "/home/arjun/CM/repos/local/cache/f2ac2b26439f49be/inference/tools/submission/log_parser.py", line 44, in __init__ raise RuntimeError("Encountered invalid line: {:}".format(line))...
We have already enabled TEST01 for SDXL - wasn't mandatory for v4.0 (because the proposal came late), but mandatory for v4.1. https://github.com/mlcommons/inference/pull/1574 NVIDIA has checked internally and SDXL can be...
I installed CM following the guide in https://docs.mlcommons.org/ck/install/ successfully and then refer to https://docs.mlcommons.org/inference/benchmarks/language/bert/ to run the scripts as below: cm run script --tags=run-mlperf,inference,_find-performance,_full \ --model=bert-99 \ --implementation=nvidia \ --framework=tensorrt...
``` WARNING:Mixtral-8x7B-Instruct-v0.1-MAIN:Accuracy run will generate the accuracy logs, but the evaluation of the log is not completed yet ```
Hello mlcommons team, I want to run the "Automated command to run the benchmark via MLCommons CM" (from the example: https://github.com/mlcommons/inference/tree/master/language/llama2-70b), but I do not want to download llama2-70b, since...
Get error message "unrecognized arguments: rocm" when running mlperf inference on ubuntu with rocm
The command used to run mlperf inference for resnet50 model on ubuntu with rocm is below: cm run script --tags=run-mlperf,inference \ --model=resnet50 \ --implementation=reference \ --framework=tensorflow \ --category=edge \ --scenario=Offline...
Clarified the steps to follow the prereq step was not clear since it points to an external page
Is there any reason why we have an [accuracy upper limit for LLAMA2 Tokens per sample](https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py#L109) but not for GPT-J? It's good to document this reason for users.