inference retinanet run harness fails 'executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3:'

retinanet run harness fails 'executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3:'

Open stbailey001 opened this issue 1 year ago • 4 comments

Trying to run offline retinanet in a container with one Nvidia GPU:
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=retinanet --implementation=nvidia --framework=tensorrt --category=datacenter --scenario=Offline --execution_mode=test --device=cuda --gpu_name=l4 --docker_cache=no --quiet --test_query_count=500

Fails execution of [E] [TRT] 3: [executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3: Internal Error (Profile 0 has been chosen by another IExecutionContext. Use another profileIndex or destroy the IExecutionContext that use this profile.)

Full error: CMD: make run_harness RUN_ARGS=' --benchmarks=retinanet --scenarios=offline --test_mode=PerformanceOnly --offline_expected_qps=1 --user_conf_path=/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf --mlperf_conf_path=/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf --gpu_batch_size=2 --no_audit_verify ' 2>&1 ; echo $? > exitstatus | tee '/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out'

INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/benchmark-program/run-ubuntu.sh from tmp-run.sh

make run_harness RUN_ARGS=' --benchmarks=retinanet --scenarios=offline --test_mode=PerformanceOnly --offline_expected_qps=1 --user_conf_path=/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf --mlperf_conf_path=/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf --gpu_batch_size=2 --no_audit_verify ' 2>&1 ; echo $? > exitstatus | tee '/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out' [2024-10-02 15:10:05,960 main.py:229 INFO] Detected system ID: KnownSystem.e1ef67ab5fc2 [2024-10-02 15:10:06,139 harness.py:249 INFO] The harness will load 2 plugins: ['build/plugins/NMSOptPlugin/libnmsoptplugin.so', 'build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so'] [2024-10-02 15:10:06,139 generate_conf_files.py:107 INFO] Generated measurements/ entries for e1ef67ab5fc2_TRT/retinanet/Offline [2024-10-02 15:10:06,140 init.py:46 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="PerformanceOnly" --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=false --user_conf_path="/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf" --gpu_engines="./build/engines/e1ef67ab5fc2/retinanet/Offline/retinanet-Offline-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms [2024-10-02 15:10:06,140 init.py:53 INFO] Overriding Environment benchmark : Benchmark.Retinanet buffer_manager_thread_count : 0 data_dir : /root/CM/repos/local/cache/b92e7a28ac454f52/data gpu_batch_size : 2 input_dtype : int8 input_format : linear log_dir : /root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/build/logs/2024.10.02-15.10.04 map_path : data_maps/open-images-v6-mlperf/val_map.txt mlperf_conf_path : /root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf offline_expected_qps : 1.0 precision : int8 preprocessed_data_dir : /root/CM/repos/local/cache/b92e7a28ac454f52/preprocessed_data scenario : Scenario.Offline system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9J14 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=1, threads_per_core=1): 64}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=292.215448, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=292215448000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA L4', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=22.494140625, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=24152899584), max_power_limit=72.0, pci_id='0x27B810DE', compute_sm=89): 1})), numa_conf=None, system_id='e1ef67ab5fc2') tensor_path : build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear test_mode : PerformanceOnly use_graphs : False user_conf_path : /root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf system_id : e1ef67ab5fc2 config_name : e1ef67ab5fc2_retinanet_Offline workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP) optimization_level : plugin-enabled num_profiles : 1 config_ver : lwis_k_99_MaxP accuracy_level : 99% inference_server : lwis skip_file_checks : False power_limit : None cpu_freq : None &&&& RUNNING Default_Harness # ./build/bin/harness_default [I] mlperf.conf path: /root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf [I] user.conf path: /root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf Creating QSL. Finished Creating QSL. Setting up SUT. [I] [TRT] Loaded engine size: 73 MiB [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 126, GPU 473 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 128, GPU 483 (MiB) [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +68, now: CPU 0, GPU 68 (MiB) [I] Device:0.GPU: [0] ./build/engines/e1ef67ab5fc2/retinanet/Offline/retinanet-Offline-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded. [E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.) [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 55, GPU 485 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 55, GPU 493 (MiB) [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1528, now: CPU 0, GPU 1596 (MiB) [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 56, GPU 2029 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 56, GPU 2039 (MiB) [I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly. [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +1528, now: CPU 1, GPU 3124 (MiB) [E] [TRT] 3: [executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3: Internal Error (Profile 0 has been chosen by another IExecutionContext. Use another profileIndex or destroy the IExecutionContext that use this profile.) F1002 15:10:07.041591 180493 lwis.cpp:245] Check failed: context->setOptimizationProfile(profileIdx) == true (0 vs. 1) *** Check failure stack trace: *** @ 0x7fe94f4401c3 google::LogMessage::Fail() @ 0x7fe94f44525b google::LogMessage::SendToLog() @ 0x7fe94f43febf google::LogMessage::Flush() @ 0x7fe94f4406ef google::LogMessageFatal::~LogMessageFatal() @ 0x55918ac33adc lwis::Device::Setup() @ 0x55918ac35cab lwis::Server::Setup() @ 0x55918ab91a00 doInference() @ 0x55918ab8f2b0 main @ 0x7fe93d00e083 __libc_start_main @ 0x55918ab8f83e _start Aborted (core dumped) Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/main.py", line 231, in main(main_args, DETECTED_SYSTEM) File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/main.py", line 144, in main dispatch_action(main_args, config_dict, workload_setting) File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/main.py", line 202, in dispatch_action handler.run() File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/actionhandler/base.py", line 82, in run self.handle_failure() File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/actionhandler/run_harness.py", line 193, in handle_failure raise RuntimeError("Run harness failed!") RuntimeError: Run harness failed! Traceback (most recent call last): File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/actionhandler/run_harness.py", line 161, in handle result_data = self.harness.run_harness(flag_dict=self.harness_flag_dict, skip_generate_measurements=True) File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/common/harness.py", line 352, in run_harness output = run_command(self.construct_terminal_command(argstr), get_output=True, custom_env=self.env_vars) File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/common/init.py", line 67, in run_command raise subprocess.CalledProcessError(ret, cmd) subprocess.CalledProcessError: Command './build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1" --logfile_prefix="mlperf_log" --performance_sample_count=64 --test_mode="PerformanceOnly" --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=false --user_conf_path="/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf" --gpu_engines="./build/engines/e1ef67ab5fc2/retinanet/Offline/retinanet-Offline-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms' returned non-zero exit status 134. make: *** [Makefile:45: run_harness] Error 1 INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/benchmark-program/customize.py INFO:root: * cm run script "save mlperf inference state" INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/save-mlperf-inference-implementation-state/customize.py INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/customize.py INFO:root:* cm run script "get mlperf sut description" INFO:root: * cm run script "detect os" INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py INFO:root: * cm run script "detect cpu" INFO:root: * cm run script "detect os" INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-cpu/customize.py INFO:root: * cm run script "get python3" INFO:root: ! load /root/CM/repos/local/cache/7ead820172a540e6/cm-cached-state.json INFO:root:Path to Python: /usr/bin/python3 INFO:root:Python version: 3.8.10 INFO:root: * cm run script "get compiler" INFO:root: ! load /root/CM/repos/local/cache/30d4c7085bc24d5c/cm-cached-state.json INFO:root: * cm run script "get cuda-devices _with-pycuda" INFO:root: * cm run script "get cuda _toolkit" INFO:root: ! load /root/CM/repos/local/cache/137abe42c97c44f6/cm-cached-state.json INFO:root:ENV[CM_CUDA_PATH_LIB_CUDNN_EXISTS]: no INFO:root:ENV[CM_CUDA_VERSION]: 12.2 INFO:root:ENV[CM_CUDA_VERSION_STRING]: cu122 INFO:root:ENV[CM_NVCC_BIN_WITH_PATH]: /usr/local/cuda/bin/nvcc INFO:root:ENV[CUDA_HOME]: /usr/local/cuda INFO:root: * cm run script "get python3" INFO:root: ! load /root/CM/repos/local/cache/7ead820172a540e6/cm-cached-state.json INFO:root:Path to Python: /usr/bin/python3 INFO:root:Python version: 3.8.10 INFO:root: * cm run script "get generic-python-lib _package.pycuda" INFO:root: ! load /root/CM/repos/local/cache/457a72dc0cd941fc/cm-cached-state.json INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/detect.sh from tmp-run.sh GPU 0: INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/customize.py INFO:root: * cm run script "get generic-python-lib _package.dmiparser" INFO:root: ! load /root/CM/repos/local/cache/525f77d4ad5a4f72/cm-cached-state.json INFO:root: * cm run script "get cache dir _name.mlperf-inference-sut-descriptions" INFO:root: ! load /root/CM/repos/local/cache/3d93e38d01d7494d/cm-cached-state.json Generating SUT description file for e1ef67ab5fc2-tensorrt INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-sut-description/customize.py INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/run-mlperf-inference-app/customize.py

Oct 02 '24 15:10 stbailey001

inference inference copied to clipboard

retinanet run harness fails 'executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3:'

inference
inference copied to clipboard