inference
                                
                                 inference copied to clipboard
                                
                                    inference copied to clipboard
                            
                            
                            
                        retinanet run harness fails 'executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3:'
Trying to run  offline retinanet in a container with one Nvidia GPU:
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev  --model=retinanet    --implementation=nvidia    --framework=tensorrt    --category=datacenter    --scenario=Offline    --execution_mode=test    --device=cuda     --gpu_name=l4 --docker_cache=no --quiet    --test_query_count=500
Fails execution of [E] [TRT] 3: [executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3: Internal Error (Profile 0 has been chosen by another IExecutionContext. Use another profileIndex or destroy the IExecutionContext that use this profile.)
Full error: CMD: make run_harness RUN_ARGS=' --benchmarks=retinanet --scenarios=offline --test_mode=PerformanceOnly --offline_expected_qps=1 --user_conf_path=/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf --mlperf_conf_path=/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf --gpu_batch_size=2 --no_audit_verify ' 2>&1 ; echo $? > exitstatus | tee '/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out'
INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/benchmark-program/run-ubuntu.sh from tmp-run.sh
make run_harness RUN_ARGS=' --benchmarks=retinanet --scenarios=offline  --test_mode=PerformanceOnly  --offline_expected_qps=1 --user_conf_path=/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf --mlperf_conf_path=/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf --gpu_batch_size=2 --no_audit_verify  ' 2>&1 ; echo $? > exitstatus | tee '/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out'
[2024-10-02 15:10:05,960 main.py:229 INFO] Detected system ID: KnownSystem.e1ef67ab5fc2
[2024-10-02 15:10:06,139 harness.py:249 INFO] The harness will load 2 plugins: ['build/plugins/NMSOptPlugin/libnmsoptplugin.so', 'build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so']
[2024-10-02 15:10:06,139 generate_conf_files.py:107 INFO] Generated measurements/ entries for e1ef67ab5fc2_TRT/retinanet/Offline
[2024-10-02 15:10:06,140 init.py:46 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="PerformanceOnly" --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=false --user_conf_path="/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf" --gpu_engines="./build/engines/e1ef67ab5fc2/retinanet/Offline/retinanet-Offline-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms
[2024-10-02 15:10:06,140 init.py:53 INFO] Overriding Environment
benchmark : Benchmark.Retinanet
buffer_manager_thread_count : 0
data_dir : /root/CM/repos/local/cache/b92e7a28ac454f52/data
gpu_batch_size : 2
input_dtype : int8
input_format : linear
log_dir : /root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/build/logs/2024.10.02-15.10.04
map_path : data_maps/open-images-v6-mlperf/val_map.txt
mlperf_conf_path : /root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf
offline_expected_qps : 1.0
precision : int8
preprocessed_data_dir : /root/CM/repos/local/cache/b92e7a28ac454f52/preprocessed_data
scenario : Scenario.Offline
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9J14 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=1, threads_per_core=1): 64}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=292.215448, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=292215448000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA L4', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=22.494140625, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=24152899584), max_power_limit=72.0, pci_id='0x27B810DE', compute_sm=89): 1})), numa_conf=None, system_id='e1ef67ab5fc2')
tensor_path : build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear
test_mode : PerformanceOnly
use_graphs : False
user_conf_path : /root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf
system_id : e1ef67ab5fc2
config_name : e1ef67ab5fc2_retinanet_Offline
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
optimization_level : plugin-enabled
num_profiles : 1
config_ver : lwis_k_99_MaxP
accuracy_level : 99%
inference_server : lwis
skip_file_checks : False
power_limit : None
cpu_freq : None
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: /root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf
[I] user.conf path: /root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf
Creating QSL.
Finished Creating QSL.
Setting up SUT.
[I] [TRT] Loaded engine size: 73 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 126, GPU 473 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 128, GPU 483 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +68, now: CPU 0, GPU 68 (MiB)
[I] Device:0.GPU: [0] ./build/engines/e1ef67ab5fc2/retinanet/Offline/retinanet-Offline-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 55, GPU 485 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 55, GPU 493 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1528, now: CPU 0, GPU 1596 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 56, GPU 2029 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 56, GPU 2039 (MiB)
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +1528, now: CPU 1, GPU 3124 (MiB)
[E] [TRT] 3: [executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3: Internal Error (Profile 0 has been chosen by another IExecutionContext. Use another profileIndex or destroy the IExecutionContext that use this profile.)
F1002 15:10:07.041591 180493 lwis.cpp:245] Check failed: context->setOptimizationProfile(profileIdx) == true (0 vs. 1)
*** Check failure stack trace: ***
@     0x7fe94f4401c3  google::LogMessage::Fail()
@     0x7fe94f44525b  google::LogMessage::SendToLog()
@     0x7fe94f43febf  google::LogMessage::Flush()
@     0x7fe94f4406ef  google::LogMessageFatal::~LogMessageFatal()
@     0x55918ac33adc  lwis::Device::Setup()
@     0x55918ac35cab  lwis::Server::Setup()
@     0x55918ab91a00  doInference()
@     0x55918ab8f2b0  main
@     0x7fe93d00e083  __libc_start_main
@     0x55918ab8f83e  _start
Aborted (core dumped)
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/main.py", line 231, in