inference
inference copied to clipboard
Reference implementations of MLPerf™ inference benchmarks
After installing the docer engine and the NVIDIA Container with rootless mode configured, and then clone mlcommons/inference_results_v4.0.git and configure the MLPERF_SCRATCH_PATH, and start to build and launch a docker container...
The current implementation of GPT-J and BERT carries out the prediction in sequential manner. Could the performance of GPT-J and BERT be improved by implementing parallel processing through threads rather...
I'm trying to run the rnnt inference on mlperf4.0 with the below commands and after a long time the whole terminal will disppear without any error prompt. I'm not sure...
Does it support mobilenetv3?
error
(mlperf) susie.sun@yizhu-R5300-G5:~$ cmr "run mobilenet-models _tflite _accuracy-only" \ > --adr.compiler.tags=gcc \ > --results_dir=$HOME/mobilenet_results * cm run script "run mobilenet-models _tflite _accuracy-only" * cm run script "get sys-utils-cm" ! load /home/susie.sun/CM/repos/local/cache/576bc766a772475d/cm-cached-state.json...
Can it run on AMD gpu?
Do you support a local path to the model?
(mlperf) susie.sun@yizhu-R5300-G5:~$ cmr "run mlperf inference generate-run-cmds _submission" --quiet --submitter="MLCommons" --hw_name=default --model=resnet50 --implementation=reference --backend=tf --device=gpu --scenario=Offline --adr.compiler.tags=gcc --target_qps=1 --category=edge --division=open * cm run script "run mlperf inference generate-run-cmds _submission" *...
I want to test the performance of the entire cluster on multiple Gpus. Is there a case
The submission checker has two undefined variables https://github.com/mlcommons/inference/blob/2a78b2fdc0407b70681771af94d68577f472db77/tools/submission/submission_checker.py#L1964-L1965