ck Failure when running BERT benchmark on NVIDIA Jetson AGX

After following the official guidelines on the NVIDIA Jetson AGX, with a modified run-command, namely:

cmr "generate-run-cmds inference _submission _all-scenarios" --model=bert-99 --device=cuda --implementation=nvidia-original --backend=tensorrt --execution-mode=valid --results_dir=$HOME/results_dir --category=edge --division=open --quiet --gpu_name=orin --adr.cuda.version=11.4

The following error persists:

* cm run script "generate-run-cmds inference _submission _all-scenarios"
  * cm run script "detect os"
  * cm run script "detect cpu"
    * cm run script "detect os"
  * cm run script "get python3"
  * cm run script "get mlcommons inference src"
  * cm run script "get sut description"
    * cm run script "detect os"
    * cm run script "detect cpu"
      * cm run script "detect os"
    * cm run script "get python3"
    * cm run script "get compiler"
    * cm run script "get cuda-devices"
      * cm run script "get cuda _toolkit"
rm: cannot remove 'a.out': No such file or directory

Checking compiler version ...

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0

Compiling program ...


Running program ...

GPU Device ID: 0
GPU Name: Xavier
GPU compute capability: 7.2
CUDA driver version: 11.4
CUDA runtime version: 11.4
Global memory: 32517214208
Max clock rate: 1377.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  2048
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535


    * cm run script "get generic-python-lib _package.dmiparser"
Generating SUT description file for xavier-tensorrt
Using MLCommons Inference source from /media/nvmedrive/CM/repos/local/cache/11d000419e2f449f/inference
Valid Scenarios for bert-99 in edge category are :['SingleStream', 'Offline']

Running loadgen scenario: SingleStream and mode: performance
* cm run script "app mlperf inference generic _nvidia-original _bert-99 _tensorrt _cuda _valid _r3.1_default _singlestream"
  * cm run script "detect os"
  * cm run script "get sys-utils-cm"
    * cm run script "detect os"
  * cm run script "get python"
  * cm run script "get mlcommons inference src _deeplearningexamples"
  * cm run script "get cuda-devices"
    * cm run script "get cuda _toolkit"
rm: cannot remove 'a.out': No such file or directory

Checking compiler version ...

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0

Compiling program ...


Running program ...

GPU Device ID: 0
GPU Name: Xavier
GPU compute capability: 7.2
CUDA driver version: 11.4
CUDA runtime version: 11.4
Global memory: 32517214208
Max clock rate: 1377.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  2048
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535


  * cm run script "get dataset squad language-processing"
  * cm run script "get dataset-aux squad-vocab"
* cm run script "reproduce mlperf nvidia inference _tensorrt _cuda _bert-99 _singlestream _orin"
  * cm run script "detect os"
  * cm run script "detect cpu"
    * cm run script "detect os"
  * cm run script "get sys-utils-cm"
    * cm run script "detect os"
  * cm run script "get cuda _cudnn"
  * cm run script "get tensorrt"
  * cm run script "build nvidia inference server _nvidia-only"
  * cm run script "get mlperf inference nvidia scratch space"
  * cm run script "get generic-python-lib _mlperf_logging"
  * cm run script "get ml-model bert _onnx _fp32"
  * cm run script "get ml-model bert _onnx _int8"
  * cm run script "get squad-vocab"
  * cm run script "get mlcommons inference src _deeplearningexamples"
  * cm run script "get nvidia mlperf inference common-code _nvidia-only"
  * cm run script "generate user-conf mlperf inference"
    * cm run script "detect os"
    * cm run script "detect cpu"
      * cm run script "detect os"
    * cm run script "get python"
    * cm run script "get mlcommons inference src _deeplearningexamples"
    * cm run script "get sut configs"
Using MLCommons Inference source from '/media/nvmedrive/CM/repos/local/cache/0269af6fbcb5417a/inference'
Original configuration value 1 target_latency
Adjusted configuration value 0.4 target_latency
Output Dir: '/home/jetson/results_dir/valid_results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/singlestream/performance/run_1'
bert.SingleStream.target_latency = 0.4
bert.SingleStream.max_duration = 660000 

  * cm run script "get generic-python-lib _transformers"
  * cm run script "get generic-python-lib _safetensors"
  * cm run script "get generic-python-lib _onnx"
  * cm run script "reproduce mlperf inference nvidia harness _build_engine _batch_size.1 _tensorrt _cuda _bert-99 _singlestream _orin _bert_"
    * cm run script "detect os"
    * cm run script "detect cpu"
      * cm run script "detect os"
    * cm run script "get sys-utils-cm"
      * cm run script "detect os"
    * cm run script "get cuda _cudnn"
    * cm run script "get tensorrt"
    * cm run script "build nvidia inference server _nvidia-only"
    * cm run script "get mlperf inference nvidia scratch space"
    * cm run script "get generic-python-lib _mlperf_logging"
    * cm run script "get ml-model bert _onnx _fp32"
    * cm run script "get ml-model bert _onnx _int8"
    * cm run script "get squad-vocab"
    * cm run script "get mlcommons inference src _deeplearningexamples"
    * cm run script "get nvidia mlperf inference common-code _nvidia-only"
    * cm run script "reproduce mlperf inference nvidia harness _preprocess_data _tensorrt _cuda _bert-99 _bert_"
    * cm run script "get generic-python-lib _transformers"
    * cm run script "get generic-python-lib _safetensors"
    * cm run script "get generic-python-lib _onnx"
make generate_engines RUN_ARGS=' --benchmarks=bert --scenarios=singlestream  --test_mode=PerformanceOnly  --no_audit_verify  '
[2023-08-21 15:00:40,008 main.py:231 INFO] Detected system ID: KnownSystem.Orin
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/code/main.py", line 233, in <module>
    main(main_args, DETECTED_SYSTEM)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/code/main.py", line 104, in main
    load_config_fn(benchmarks, scenarios)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/code/main.py", line 54, in populate_config_registry
    ConfigRegistry.load_configs(benchmark, scenario)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/configs/configuration.py", line 123, in load_configs
    importlib.import_module(f"{base_module}.custom")
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/configs/bert/SingleStream/custom.py", line 8, in <module>
    class ORIN(SingleStreamGPUBaseConfig):
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/configs/configuration.py", line 207, in _do_register
    raise KeyError("Config for {} is already registered.".format("/".join(map(str, keyspace))))
KeyError: 'Config for Benchmark.BERT/Scenario.SingleStream/KnownSystem.Orin/WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP) is already registered.'
make: *** [Makefile:37: generate_engines] Error 1

CM error: Portable CM script failed (name = reproduce-mlperf-inference-nvidia, return code = 256)

Note that it is often a portability problem of the third-party tool or native script that is wrapped and unified by this CM script.
The CM concept is to collaboratively fix such issues inside portable CM scripts to make existing tools and native script more portable, interoperable, deterministic and reproducible.

Please help the community by reporting the full log with the command line here:
* https://github.com/mlcommons/ck/issues 
* https://cKnowledge.org/mlcommons-taskforce

The benchmark runs fine on my NVIDIA Orin, but for all of the benchmarks I have tested, it fails to build engines.

Aug 21 '23 13:08 JoachimMoe

In step 4 here did you use --custom_system=no ?

Aug 21 '23 13:08 arjunsuresh

Adding custom_system=no results in the following:

cmr "generate-run-cmds inference _submission _all-scenarios" --model=bert-99 --device=cuda --implementation=nvidia-original --backend=tensorrt --execution-mode=valid --results_dir=$HOME/results_dir --category=edge --division=open --quiet --gpu_name=orin --adr.cuda.version=11.4 --custom_system=no
* cm run script "generate-run-cmds inference _submission _all-scenarios"
  * cm run script "detect os"
  * cm run script "detect cpu"
    * cm run script "detect os"
  * cm run script "get python3"
  * cm run script "get mlcommons inference src"
  * cm run script "get sut description"
    * cm run script "detect os"
    * cm run script "detect cpu"
      * cm run script "detect os"
    * cm run script "get python3"
    * cm run script "get compiler"
    * cm run script "get cuda-devices"
      * cm run script "get cuda _toolkit"
rm: cannot remove 'a.out': No such file or directory

Checking compiler version ...

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0

Compiling program ...


Running program ...

GPU Device ID: 0
GPU Name: Xavier
GPU compute capability: 7.2
CUDA driver version: 11.4
CUDA runtime version: 11.4
Global memory: 32517214208
Max clock rate: 1377.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  2048
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535


    * cm run script "get generic-python-lib _package.dmiparser"
Generating SUT description file for xavier-tensorrt
Using MLCommons Inference source from /media/nvmedrive/CM/repos/local/cache/11d000419e2f449f/inference
Valid Scenarios for bert-99 in edge category are :['SingleStream', 'Offline']

Running loadgen scenario: SingleStream and mode: performance
* cm run script "app mlperf inference generic _nvidia-original _bert-99 _tensorrt _cuda _valid _r3.1_default _singlestream"
  * cm run script "detect os"
  * cm run script "get sys-utils-cm"
    * cm run script "detect os"
  * cm run script "get python"
  * cm run script "get mlcommons inference src _deeplearningexamples"
  * cm run script "get cuda-devices"
    * cm run script "get cuda _toolkit"
rm: cannot remove 'a.out': No such file or directory

Checking compiler version ...

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0

Compiling program ...


Running program ...

GPU Device ID: 0
GPU Name: Xavier
GPU compute capability: 7.2
CUDA driver version: 11.4
CUDA runtime version: 11.4
Global memory: 32517214208
Max clock rate: 1377.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  2048
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535


  * cm run script "get dataset squad language-processing"
  * cm run script "get dataset-aux squad-vocab"
* cm run script "reproduce mlperf nvidia inference _cuda _singlestream _tensorrt _bert-99 _orin"
  * cm run script "detect os"
  * cm run script "detect cpu"
    * cm run script "detect os"
  * cm run script "get sys-utils-cm"
    * cm run script "detect os"
  * cm run script "get cuda _cudnn"
  * cm run script "get tensorrt"
  * cm run script "build nvidia inference server _nvidia-only"
  * cm run script "get mlperf inference nvidia scratch space"
  * cm run script "get generic-python-lib _mlperf_logging"
  * cm run script "get ml-model bert _onnx _fp32"
  * cm run script "get ml-model bert _onnx _int8"
  * cm run script "get squad-vocab"
  * cm run script "get mlcommons inference src _deeplearningexamples"
  * cm run script "get nvidia mlperf inference common-code _nvidia-only"
  * cm run script "generate user-conf mlperf inference"
    * cm run script "detect os"
    * cm run script "detect cpu"
      * cm run script "detect os"
    * cm run script "get python"
    * cm run script "get mlcommons inference src _deeplearningexamples"
    * cm run script "get sut configs"
Using MLCommons Inference source from '/media/nvmedrive/CM/repos/local/cache/0269af6fbcb5417a/inference'
Original configuration value 1 target_latency
Adjusted configuration value 0.4 target_latency
Output Dir: '/home/jetson/results_dir/valid_results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/singlestream/performance/run_1'
bert.SingleStream.target_latency = 0.4
bert.SingleStream.max_duration = 660000 

  * cm run script "get generic-python-lib _transformers"
  * cm run script "get generic-python-lib _safetensors"
  * cm run script "get generic-python-lib _onnx"
  * cm run script "reproduce mlperf inference nvidia harness _build_engine _batch_size.1 _cuda _singlestream _tensorrt _bert-99 _orin _bert_"
    * cm run script "detect os"
    * cm run script "detect cpu"
      * cm run script "detect os"
    * cm run script "get sys-utils-cm"
      * cm run script "detect os"
    * cm run script "get cuda _cudnn"
    * cm run script "get tensorrt"
    * cm run script "build nvidia inference server _nvidia-only"
    * cm run script "get mlperf inference nvidia scratch space"
    * cm run script "get generic-python-lib _mlperf_logging"
    * cm run script "get ml-model bert _onnx _fp32"
    * cm run script "get ml-model bert _onnx _int8"
    * cm run script "get squad-vocab"
    * cm run script "get mlcommons inference src _deeplearningexamples"
    * cm run script "get nvidia mlperf inference common-code _nvidia-only"
    * cm run script "reproduce mlperf inference nvidia harness _preprocess_data _cuda _tensorrt _bert-99 _bert_"
    * cm run script "get generic-python-lib _transformers"
    * cm run script "get generic-python-lib _safetensors"
    * cm run script "get generic-python-lib _onnx"
make generate_engines RUN_ARGS=' --benchmarks=bert --scenarios=singlestream  --test_mode=PerformanceOnly  --no_audit_verify  '
[2023-08-21 17:11:28,147 main.py:231 INFO] Detected system ID: KnownSystem.Orin
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/code/main.py", line 233, in <module>
    main(main_args, DETECTED_SYSTEM)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/code/main.py", line 104, in main
    load_config_fn(benchmarks, scenarios)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/code/main.py", line 54, in populate_config_registry
    ConfigRegistry.load_configs(benchmark, scenario)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/configs/configuration.py", line 123, in load_configs
    importlib.import_module(f"{base_module}.custom")
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/configs/bert/SingleStream/custom.py", line 8, in <module>
    class ORIN(SingleStreamGPUBaseConfig):
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/configs/configuration.py", line 207, in _do_register
    raise KeyError("Config for {} is already registered.".format("/".join(map(str, keyspace))))
KeyError: 'Config for Benchmark.BERT/Scenario.SingleStream/KnownSystem.Orin/WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP) is already registered.'
make: *** [Makefile:37: generate_engines] Error 1

CM error: Portable CM script failed (name = reproduce-mlperf-inference-nvidia, return code = 256)

Note that it is often a portability problem of the third-party tool or native script that is wrapped and unified by this CM script.
The CM concept is to collaboratively fix such issues inside portable CM scripts to make existing tools and native script more portable, interoperable, deterministic and reproducible.

Please help the community by reporting the full log with the command line here:
* https://github.com/mlcommons/ck/issues 
* https://cKnowledge.org/mlcommons-taskforce

Edit: Same error when adding --env.OUTSIDE_MLPINF_ENV=1 as well.

Aug 21 '23 15:08 JoachimMoe

Same goes for ResNet50:

Run command:

cmr "generate-run-cmds inference _performance-only" --model=resnet50 --device=cuda --implementation=nvidia-original --backend=tensorrt --results_dir=$HOME/results_dir --category=edge --division=open --quiet --gpu_name=orin --adr.cuda.version=11.4 --adr.nvidia-harness.input_format=linear --custom.system=no

Finished preprocessing all the datasets!
make generate_engines RUN_ARGS=' --benchmarks=resnet50 --scenarios=offline  --test_mode=PerformanceOnly  --input_format=linear --no_audit_verify  '
[2023-08-21 17:32:39,728 main.py:231 INFO] Detected system ID: KnownSystem.Orin
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/code/main.py", line 233, in <module>
    main(main_args, DETECTED_SYSTEM)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/code/main.py", line 104, in main
    load_config_fn(benchmarks, scenarios)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/code/main.py", line 54, in populate_config_registry
    ConfigRegistry.load_configs(benchmark, scenario)
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/configs/configuration.py", line 123, in load_configs
    importlib.import_module(f"{base_module}.custom")
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/configs/resnet50/Offline/custom.py", line 8, in <module>
    class ORIN(OfflineGPUBaseConfig):
  File "/media/nvmedrive/CM/repos/local/cache/67269502e095480a/repo/closed/NVIDIA/configs/configuration.py", line 207, in _do_register
    raise KeyError("Config for {} is already registered.".format("/".join(map(str, keyspace))))
KeyError: 'Config for Benchmark.ResNet50/Scenario.Offline/KnownSystem.Orin/WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP) is already registered.'
make: *** [Makefile:37: generate_engines] Error 1

CM error: Portable CM script failed (name = reproduce-mlperf-inference-nvidia, return code = 256)

Note that it is often a portability problem of the third-party tool or native script that is wrapped and unified by this CM script.
The CM concept is to collaboratively fix such issues inside portable CM scripts to make existing tools and native script more portable, interoperable, deterministic and reproducible.

Please help the community by reporting the full log with the command line here:
* https://github.com/mlcommons/ck/issues 
* https://cKnowledge.org/mlcommons-taskforce

Aug 21 '23 16:08 JoachimMoe

"--custom_system=no" should be done in the previous step as mentioned in my previous comment and not while running the benchmarks. Currently there are custom config generated for all the models and it's hard to revert. Probably best is to clean the cache and start again.

cm rm cache - f

Aug 21 '23 16:08 arjunsuresh

Oh yeah, I misread, apologies. Those instructions lead to an infinite-loop of having to find TensorRT.

The following process:

cm rm cache -f 
cm run script --tags=get,nvidia,common-code,_custom --out=json
cm run script --tags=get,tensorrt --input=<PATH_TO_TENSORRT_TAR_FILE>
cm run script --tags=build,nvidia,inference,server,custom_system=no

Leads to:

cm run script --tags=build,nvidia,inference,server, --custom_system=no
* cm run script "build nvidia inference server"
  * cm run script "detect os"
  * cm run script "detect cpu"
    * cm run script "detect os"
  * cm run script "get sys-utils-cm"
  * cm run script "get python3"
  * cm run script "get cuda _cudnn"
  * cm run script "get tensorrt _dev"
    * cm run script "detect os"
    * cm run script "get python3"

CM error: Please envoke cm run script "get tensorrt" --tar_file={full path to the TensorRT tar file}!

And when trying to re-run the get tensorrt, it produces:

cmr "get tensorrt" --tar_file=/home/jetson/CM/TensorRT-8.5.2.2.Ubuntu-20.04.aarch64-gnu.cuda-11.8.cudnn8.6.tar
* cm run script "get tensorrt"

Aug 22 '23 08:08 JoachimMoe

I should mention, that to even get the original BERT command running, one has to perform a few hacks.

First and foremost, one error is prompted that NVIDIA drivers >=515 is needed. The problem is that this uses NVIDIA-SMI in order to query which driver is actually installed. Since this is typically not usable (nvidia-smi), one has to change:

Line 46 in /media/nvmedrive/CM/repos/local/cache/<CACHE_ID>/repo/closed/NVIDIA/Makefile.const such that IS_SOC = 1 is set.

Re-running the command then prompts the user that only the system is not SOC. This is because the device ID is queried, which is hard-coded to be 87 for the Orin device. Since Xavier certainly has a SOC, one has to add:

SOC_SM = 87 in: /media/nvmedrive/CM/repos/local/cache/<CACHE_ID>/repo/closed/NVIDIA/Makefile.docker

This makes the original BERT command actually compile, but in the end turns out to fail building engines.

The reasoning behind adding SM_SOC=87 is to pass the if-else clause in lines 144 in the Makefile.docker and this also makes sure that the if-else clause in the Makefile.build lines 174-180 does not fail.

Aug 22 '23 09:08 JoachimMoe

@JoachimMoe The command is searching for "get tensorrt _dev" but the installation is for "get tensorrt". This PR should fix the output message. For now can you please try the following?

cmr "get tensorrt _dev" --tar_file=/home/jetson/CM/TensorRT-8.5.2.2.Ubuntu-20.04.aarch64-gnu.cuda-11.8.cudnn8.6.tar

Aug 22 '23 09:08 arjunsuresh

Following and combining these commands for the first time correctly executes the benchmark. There are problems however. The following error is given:

Run Directory: /media/nvmedrive/CM/repos/local/cache/f50c6e500ca24f58/repo/closed/NVIDIA

CMD: make run_harness RUN_ARGS=' --benchmarks=bert --scenarios=offline  --test_mode=PerformanceOnly  --user_conf_path=/home/jetson/CM/repos/mlcommons@ck/cm-mlops/script/generate-mlperf-inference-user-conf/tmp/e070ee5c8f9c4571902c25a40e3d40f3.conf --mlperf_conf_path=/media/nvmedrive/CM/repos/local/cache/e59beac7bb994e63/inference/mlperf.conf --no_audit_verify  '

[2023-08-22 14:00:42,063 main.py:229 INFO] Detected system did not match any known systems. Exiting. SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='ARMv8 Processor rev 0 (v8l)', architecture=<CPUArchitecture.aarch64: AliasedName(name='aarch64', aliases=(), patterns=())>, core_count=2, threads_per_core=1): 4}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=31.755092, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=31755092000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='Jetson-AGX', accelerator_type=<AcceleratorType.Integrated: AliasedName(name='Integrated', aliases=(), patterns=())>, vram=None, max_power_limit=None, pci_id=None, compute_sm=72): 1})), numa_conf=None, system_id=None)
 
======================== Result summaries: ========================


Running loadgen scenario: Offline and mode: accuracy
* cm run script "app mlperf inference generic _nvidia-original _bert-99 _tensorrt _cuda _valid _r3.1_default _offline"
  * cm run script "detect os"
  * cm run script "get sys-utils-cm"
  * cm run script "get python"
  * cm run script "get mlcommons inference src _deeplearningexamples"
  * cm run script "get cuda-devices"
    * cm run script "get cuda _toolkit"
rm: cannot remove 'a.out': No such file or directory

Checking compiler version ...

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0

Compiling program ...


Running program ...

GPU Device ID: 0
GPU Name: Xavier
GPU compute capability: 7.2
CUDA driver version: 11.4
CUDA runtime version: 11.4
Global memory: 32517214208
Max clock rate: 1377.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  2048
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535


  * cm run script "get dataset squad language-processing"
  * cm run script "get dataset-aux squad-vocab"
* cm run script "reproduce mlperf nvidia inference _offline _tensorrt _bert-99 _cuda _orin"
  * cm run script "detect os"
  * cm run script "detect cpu"
    * cm run script "detect os"
  * cm run script "get sys-utils-cm"
  * cm run script "get cuda _cudnn"
  * cm run script "get tensorrt"
  * cm run script "build nvidia inference server _nvidia-only"
  * cm run script "get mlperf inference nvidia scratch space"
  * cm run script "get generic-python-lib _mlperf_logging"
  * cm run script "get ml-model bert _onnx _fp32"
  * cm run script "get ml-model bert _onnx _int8"
  * cm run script "get squad-vocab"
  * cm run script "get mlcommons inference src _deeplearningexamples"
  * cm run script "get nvidia mlperf inference common-code _nvidia-only"
  * cm run script "generate user-conf mlperf inference"
    * cm run script "detect os"
    * cm run script "detect cpu"
      * cm run script "detect os"
    * cm run script "get python"
    * cm run script "get mlcommons inference src _deeplearningexamples"
    * cm run script "get sut configs"
Using MLCommons Inference source from '/media/nvmedrive/CM/repos/local/cache/e59beac7bb994e63/inference'
Original configuration value 1.0 target_qps
Adjusted configuration value 1.01 target_qps
Output Dir: '/home/jetson/results_dir/valid_results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy'
bert.Offline.target_qps = 1.01

  * cm run script "get generic-python-lib _transformers"
  * cm run script "get generic-python-lib _safetensors"
  * cm run script "get generic-python-lib _onnx"
  * cm run script "reproduce mlperf inference nvidia harness _build_engine _offline _tensorrt _bert-99 _cuda _orin _bert_"
  * cm run script "reproduce mlperf inference nvidia harness _preprocess_data _tensorrt _bert-99 _cuda _orin _bert_"
* cm run script "benchmark-mlperf"
* cm run script "benchmark-program program"
  * cm run script "detect cpu"
    * cm run script "detect os"
***************************************************************************
CM script::benchmark-program/run.sh

Run Directory: /media/nvmedrive/CM/repos/local/cache/f50c6e500ca24f58/repo/closed/NVIDIA

CMD: make run_harness RUN_ARGS=' --benchmarks=bert --scenarios=offline  --test_mode=AccuracyOnly  --user_conf_path=/home/jetson/CM/repos/mlcommons@ck/cm-mlops/script/generate-mlperf-inference-user-conf/tmp/4781ee6ad96844418b446e69657d2c18.conf --mlperf_conf_path=/media/nvmedrive/CM/repos/local/cache/e59beac7bb994e63/inference/mlperf.conf --no_audit_verify  '

This means that none of the directories mlperf_submission or results_dir have any files in them.

Aug 22 '23 12:08 JoachimMoe

That's the expected output :-) Because you're running on Xavier and Nvidia code is no longer supporting it.

[2023-08-22 14:00:42,063 main.py:229 INFO] Detected system did not match any known systems. Exiting. SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='ARMv8 Processor rev 0 (v8l)', architecture=<CPUArchitecture.aarch64: AliasedName(name='aarch64', aliases=(), patterns=())>, core_count=2, threads_per_core=1): 4}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=31.755092, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=31755092000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='Jetson-AGX', accelerator_type=<AcceleratorType.Integrated: AliasedName(name='Integrated', aliases=(), patterns=())>, vram=None, max_power_limit=None, pci_id=None, compute_sm=72): 1})), numa_conf=None, system_id=None)

Now you should do some code changes to make it work. Since these are never tried before on Xavier I have no guarantee that it'll work- but worth a try.

cd /media/nvmedrive/CM/repos/local/cache/f50c6e500ca24f58/repo/closed/NVIDIA

In code/common/systems/system_list.py you can try adding Xavier system like done here. https://github.com/mlcommons/inference_results_v2.0/blob/master/closed/NVIDIA/code/common/systems/system_list.py#L128

After that you'll still need to add the benchmark/scenario specific configs for Xavier like done here: https://github.com/mlcommons/inference_results_v2.0/blob/master/closed/NVIDIA/configs/bert/Offline/init.py#L1015

Aug 22 '23 12:08 arjunsuresh

It looks as if the system_list.py already contains some Xavier AGX system configurations, i.e.

add_systems("AGX_Xavier", "AGX_Xavier",
            KnownCPU.NVIDIA_Carmel_ARM_V8.value,
            KnownGPU.AGX_Xavier.value, [1], Memory(30, ByteSuffix.GiB),
            target_dict=_deprecated_systems)
add_systems("Xavier_NX", "Xavier_NX",
            KnownCPU.NVIDIA_Carmel_ARM_V8.value,
            KnownGPU.Xavier_NX.value, [1], Memory(7, ByteSuffix.GiB),
            target_dict=_deprecated_systems)

As seen, this has the target_dict set to deprecated systems, whereas Orin is defined as:

# Embedded systems
add_systems("Orin", "Orin", KnownCPU.ARM_V8_Generic.value, KnownGPU.Orin.value,
            [1], Memory(7, ByteSuffix.GiB))
add_systems("Orin_NX", "Orin_NX", KnownCPU.ARM_V8_Generic.value, KnownGPU.Orin_NX.value,
            [1], Memory(7, ByteSuffix.GiB))

From the function signature it is stated that the parameter "target_dict" is where the system configuration is added to. Simply removing the argument in the AGX function call should therefore default to _system_confs.

Similarly for the __init.py__ file, there are ConfigRegistry.register for the Orin, and simply adding:

@ConfigRegistry.register(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
class AGX_Xavier(OfflineGPUBaseConfig):
    system = KnownSystem.AGX_Xavier
    enable_interleaved = True
    use_small_tile_gemm_plugin = False
    gpu_batch_size = 8
    gpu_copy_streams = 1
    gpu_inference_streams = 1
    offline_expected_qps = 97


@ConfigRegistry.register(HarnessType.Custom, AccuracyTarget.k_99_9, PowerSetting.MaxP)
class AGX_Xavier_HighAccuracy(AGX_Xavier):
    precision = "fp16"
    offline_expected_qps = 50


@ConfigRegistry.register(HarnessType.Triton, AccuracyTarget.k_99, PowerSetting.MaxP)
class AGX_Xavier_Triton(AGX_Xavier):
    use_triton = True


@ConfigRegistry.register(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxQ)
class AGX_Xavier_MaxQ(AGX_Xavier):
    offline_expected_qps = 61

    # power settings
    soc_gpu_freq = 828750000
    soc_dla_freq = 115200000
    soc_cpu_freq = 1190400
    soc_emc_freq = 1600000000


@ConfigRegistry.register(HarnessType.Custom, AccuracyTarget.k_99_9, PowerSetting.MaxQ)
class AGX_Xavier_HighAccuracy_MaxQ(AGX_Xavier_MaxQ):
    precision = "fp16"
    offline_expected_qps = 31

Should be sufficient for the Xavier AGX in the Bert Offline case.

Question is how I update my current configuration to "AGX_Xavier" in order to match the now modified files?

Aug 24 '23 08:08 JoachimMoe

It should be added that simply running the command again results in the following:

======================= Result summaries: ========================


Running loadgen scenario: Offline and mode: accuracy
* cm run script "app mlperf inference generic _nvidia-original _bert-99 _tensorrt _cuda _valid _r3.1_default _offline"
  * cm run script "detect os"
  * cm run script "get sys-utils-cm"
  * cm run script "get python"
  * cm run script "get mlcommons inference src _deeplearningexamples"
  * cm run script "get cuda-devices"
    * cm run script "get cuda _toolkit"
rm: cannot remove 'a.out': No such file or directory

Checking compiler version ...

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0

Compiling program ...


Running program ...

GPU Device ID: 0
GPU Name: Xavier
GPU compute capability: 7.2
CUDA driver version: 11.4
CUDA runtime version: 11.4
Global memory: 32517214208
Max clock rate: 1377.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  2048
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535


  * cm run script "get dataset squad language-processing"
  * cm run script "get dataset-aux squad-vocab"
* cm run script "reproduce mlperf nvidia inference _offline _cuda _tensorrt _bert-99 _orin"
  * cm run script "detect os"
  * cm run script "detect cpu"
    * cm run script "detect os"
  * cm run script "get sys-utils-cm"
  * cm run script "get cuda _cudnn"
  * cm run script "get tensorrt"
  * cm run script "build nvidia inference server _nvidia-only"
  * cm run script "get mlperf inference nvidia scratch space"
  * cm run script "get generic-python-lib _mlperf_logging"
  * cm run script "get ml-model bert _onnx _fp32"
  * cm run script "get ml-model bert _onnx _int8"
  * cm run script "get squad-vocab"
  * cm run script "get mlcommons inference src _deeplearningexamples"
  * cm run script "get nvidia mlperf inference common-code _nvidia-only"
  * cm run script "generate user-conf mlperf inference"
    * cm run script "detect os"
    * cm run script "detect cpu"
      * cm run script "detect os"
    * cm run script "get python"
    * cm run script "get mlcommons inference src _deeplearningexamples"
    * cm run script "get sut configs"
Using MLCommons Inference source from '/media/nvmedrive/CM/repos/local/cache/e904107894e34ef8/inference'
Original configuration value 1.0 target_qps
Adjusted configuration value 1.01 target_qps
Output Dir: '/home/jetson/results_dir/valid_results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy'
bert.Offline.target_qps = 1.01

  * cm run script "get generic-python-lib _transformers"
  * cm run script "get generic-python-lib _safetensors"
  * cm run script "get generic-python-lib _onnx"
  * cm run script "reproduce mlperf inference nvidia harness _build_engine _offline _cuda _tensorrt _bert-99 _orin _bert_"
  * cm run script "reproduce mlperf inference nvidia harness _preprocess_data _cuda _tensorrt _bert-99 _orin _bert_"
* cm run script "benchmark-mlperf"
* cm run script "benchmark-program program"
  * cm run script "detect cpu"
    * cm run script "detect os"
***************************************************************************
CM script::benchmark-program/run.sh

Run Directory: /media/nvmedrive/CM/repos/local/cache/153c6fb5374443c4/repo/closed/NVIDIA

CMD: make run_harness RUN_ARGS=' --benchmarks=bert --scenarios=offline  --test_mode=AccuracyOnly  --user_conf_path=/home/jetson/CM/repos/mlcommons@ck/cm-mlops/script/generate-mlperf-inference-user-conf/tmp/936cd85cd69446279da681717f596b99.conf --mlperf_conf_path=/media/nvmedrive/CM/repos/local/cache/e904107894e34ef8/inference/mlperf.conf --no_audit_verify  '

[2023-08-24 10:15:51,002 main.py:229 INFO] Detected system did not match any known systems. Exiting. SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='ARMv8 Processor rev 0 (v8l)', architecture=<CPUArchitecture.aarch64: AliasedName(name='aarch64', aliases=(), patterns=())>, core_count=2, threads_per_core=1): 4}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=31.755092, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=31755092000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='Jetson-AGX', accelerator_type=<AcceleratorType.Integrated: AliasedName(name='Integrated', aliases=(), patterns=())>, vram=None, max_power_limit=None, pci_id=None, compute_sm=72): 1})), numa_conf=None, system_id=None)
 
======================== Result summaries: ========================

* cm run script "generate mlperf inference submission"
  * cm run script "get python3"
  * cm run script "mlcommons inference src"
  * cm run script "get sut system-description"
    * cm run script "detect os"
    * cm run script "detect cpu"
      * cm run script "detect os"
    * cm run script "get python3"
    * cm run script "get compiler"
    * cm run script "get cuda-devices"
      * cm run script "get cuda _toolkit"
rm: cannot remove 'a.out': No such file or directory

Checking compiler version ...

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0

Compiling program ...


Running program ...

GPU Device ID: 0
GPU Name: Xavier
GPU compute capability: 7.2
CUDA driver version: 11.4
CUDA runtime version: 11.4
Global memory: 32517214208
Max clock rate: 1377.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  2048
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535


    * cm run script "get generic-python-lib _package.dmiparser"
Generating SUT description file for xavier-tensorrt
* MLPerf inference submission dir: /home/jetson/mlperf_submission
* MLPerf inference results dir: /home/jetson/results_dir/valid_results
* MLPerf inference division: open
* MLPerf inference submitter: cTuning
* System: xavier
* Implementation: nvidia_original
* Device: gpu
* Framework: tensorrt
* Framework Version: vdefault
* Run Config: default_config
* MLPerf inference model: bert-99
* cm run script "accuracy truncate mlc"
  * cm run script "get python3"
  * cm run script "get mlcommons inference src"
python3 '/media/nvmedrive/CM/repos/local/cache/1223641e01fd4e9a/inference/tools/submission/truncate_accuracy_log.py' --input '/home/jetson/mlperf_submission' --submitter 'cTuning' --backup '/home/jetson/mlperf_submission_logs'
ERROR:main:open/cTuning/results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/mlperf_log_accuracy.json missing
ERROR:main:open/cTuning/results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/mlperf_log_accuracy.json missing
ERROR:main:open/cTuning/results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/singlestream/accuracy/mlperf_log_accuracy.json missing
ERROR:main:open/cTuning/results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/singlestream/accuracy/mlperf_log_accuracy.json missing
ERROR:main:no submission in open/cTuning/compliance
INFO:main:Make sure you keep a backup of /home/jetson/mlperf_submission_logs in case mlperf wants to see the original accuracy logs
* cm run script "submission checker mlc"
  * cm run script "get python3"
  * cm run script "get mlcommons inference src"
  * cm run script "get generic-python-lib _xlsxwriter"
    * cm run script "detect os"
    * cm run script "detect cpu"
      * cm run script "detect os"
    * cm run script "get python3"
    * cm run script "get generic-python-lib _pip"
/usr/bin/python3 /home/jetson/CM/repos/mlcommons@ck/cm-mlops/script/get-generic-python-lib/detect-version.py > tmp-ver.out 2> tmp-ver.err
        Detected version: 3.1.2
/usr/bin/python3 /home/jetson/CM/repos/mlcommons@ck/cm-mlops/script/get-generic-python-lib/detect-version.py > tmp-ver.out 2> tmp-ver.err
  * cm run script "get generic-python-lib _pandas"
    * cm run script "detect os"
    * cm run script "detect cpu"
      * cm run script "detect os"
    * cm run script "get python3"
    * cm run script "get generic-python-lib _pip"
    - Searching for versions:  >= 1.0.0
/usr/bin/python3 /home/jetson/CM/repos/mlcommons@ck/cm-mlops/script/get-generic-python-lib/detect-version.py > tmp-ver.out 2> tmp-ver.err
        Detected version: 2.0.3
/usr/bin/python3 /home/jetson/CM/repos/mlcommons@ck/cm-mlops/script/get-generic-python-lib/detect-version.py > tmp-ver.out 2> tmp-ver.err
/usr/bin/python3 /media/nvmedrive/CM/repos/local/cache/1223641e01fd4e9a/inference/tools/submission/submission_checker.py --input /home/jetson/mlperf_submission --submitter cTuning  
[2023-08-24 10:15:58,941 submission_checker.py:2401 ERROR] open/cTuning has the following empty directories: ['open/cTuning/results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy', 'open/cTuning/results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/performance/run_1', 'open/cTuning/results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/singlestream/accuracy', 'open/cTuning/results/xavier-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/singlestream/performance/run_1']
[2023-08-24 10:15:58,941 submission_checker.py:3266 INFO] ---
[2023-08-24 10:15:58,941 submission_checker.py:3272 INFO] ---
[2023-08-24 10:15:58,942 submission_checker.py:3275 ERROR] NoResults open/cTuning
[2023-08-24 10:15:58,942 submission_checker.py:3357 INFO] ---
[2023-08-24 10:15:58,942 submission_checker.py:3358 INFO] Results=0, NoResults=1, Power Results=0
[2023-08-24 10:15:58,942 submission_checker.py:3365 INFO] ---
[2023-08-24 10:15:58,942 submission_checker.py:3366 INFO] Closed Results=0, Closed Power Results=0

[2023-08-24 10:15:58,942 submission_checker.py:3371 INFO] Open Results=0, Open Power Results=0

[2023-08-24 10:15:58,942 submission_checker.py:3376 INFO] Network Results=0, Network Power Results=0

[2023-08-24 10:15:58,942 submission_checker.py:3381 INFO] ---
[2023-08-24 10:15:58,942 submission_checker.py:3383 INFO] Systems=0, Power Systems=0
[2023-08-24 10:15:58,942 submission_checker.py:3384 INFO] Closed Systems=0, Closed Power Systems=0
[2023-08-24 10:15:58,943 submission_checker.py:3389 INFO] Open Systems=0, Open Power Systems=0
[2023-08-24 10:15:58,943 submission_checker.py:3394 INFO] Network Systems=0, Network Power Systems=0
[2023-08-24 10:15:58,943 submission_checker.py:3399 INFO] ---
[2023-08-24 10:15:58,943 submission_checker.py:3401 ERROR] SUMMARY: submission has errors

CM error: Portable CM script failed (name = run-mlperf-inference-submission-checker, return code = 256)

Note that it is often a portability problem of the third-party tool or native script that is wrapped and unified by this CM script.
The CM concept is to collaboratively fix such issues inside portable CM scripts to make existing tools and native script more portable, interoperable, deterministic and reproducible.

Please help the community by reporting the full log with the command line here:
* https://github.com/mlcommons/ck/issues 
* https://cKnowledge.org/mlcommons-taskforce

Thank you!

Aug 24 '23 08:08 JoachimMoe

Detected system did not match any known systems. Exiting.

This means that the system detection is not matching any of the configured systems. So, the only option will be to do cm run script --tags=add,custom,system,nvidia - and create custom configs. When this is done, custom configurations will need to be manually updated for each model and scenario.

Aug 25 '23 15:08 arjunsuresh

After some doodling and attempts to create custom configs, the error is now left at the following:

CM script::benchmark-program/run.sh

Run Directory: /media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA

CMD: make run_harness RUN_ARGS=' --benchmarks=resnet50 --scenarios=offline  --test_mode=PerformanceOnly  --user_conf_path=/home/jetson/CM/repos/mlcommons@ck/cm-mlops/script/generate-mlperf-inference-user-conf/tmp/1f4f7c0148a1438cb96beb12c3eeba6a.conf --mlperf_conf_path=/media/nvmedrive/CM/repos/local/cache/663a46e2430e4b7d/inference/mlperf.conf --input_format=linear --no_audit_verify  '

[2023-08-29 14:24:01,606 main.py:231 INFO] Detected system ID: KnownSystem.AGX_Xavier
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/main.py", line 233, in <module>
    main(main_args, DETECTED_SYSTEM)
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/main.py", line 146, in main
    dispatch_action(main_args, config_dict, workload_setting)
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/main.py", line 204, in dispatch_action
    handler.run()
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/actionhandler/base.py", line 75, in run
    self.setup()
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/actionhandler/run_harness.py", line 85, in setup
    duper.run()
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/common/protected_super.py", line 137, in _f
    return r(*args, **kwargs)
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/actionhandler/base.py", line 75, in run
    self.setup()
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/actionhandler/generate_conf_files.py", line 70, in setup
    self.harness, self.benchmark_conf = get_harness(self.benchmark_conf, None)
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/__init__.py", line 124, in get_harness
    harness = get_cls(G_HARNESS_CLASS_MAP[k])(config, benchmark)
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/common/lwis_harness.py", line 31, in __init__
    super().__init__(args, benchmark)
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/common/harness.py", line 100, in __init__
    self.enumerate_engines()
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/common/harness.py", line 193, in enumerate_engines
    self.check_file_exists(self.gpu_engine)
  File "/media/nvmedrive/CM/repos/local/cache/5e509c6519034851/repo/closed/NVIDIA/code/common/harness.py", line 216, in check_file_exists
    raise RuntimeError("File {:} does not exist.".format(f))
RuntimeError: File ./build/engines/AGX_Xavier/resnet50/Offline/resnet50-Offline-gpu-b0-int8.lwis_k_99_MaxP.plan does not exist.
make: *** [Makefile:45: run_harness] Error 1

Any ideas on progression?

Aug 29 '23 12:08 JoachimMoe

@JoachimMoe Can you please do cm rm cache --tags=harness -f

and add --adr.nvidia-harness.tags=_batch_size.64 to the run command?

Currently, it is taking b0 (resnet50-Offline-gpu-b0-int8.lwis_k_99_MaxP.) and is the reason for failure. This is because the newly generated config files are by default empty - adding gpu_batch_size in the config files directly is another way to solve this issue.

Aug 30 '23 15:08 arjunsuresh

ck ck copied to clipboard

Failure when running BERT benchmark on NVIDIA Jetson AGX

ck
ck copied to clipboard