inference_results_v3.0
inference_results_v3.0 copied to clipboard
NVIDIA make generate_engines Error Code 4: Internal Error Network has dynamic or shape inputs
- Run generate_engines
cmd
make generate_engines RUN_ARGS="--benchmarks=bert --scenarios=offline"will lead to the errorError Code 4: Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)
The detailed error is as follows:
(mlperf) user@mlperf-inference-user-x86_64:/work$ make generate_engines RUN_ARGS="--benchmarks=bert --scenarios=offline"
[2023-07-12 18:56:08,807 main.py:231 INFO] Detected system ID: KnownSystem.H100_PCIe_80GB_Custom
[2023-07-12 18:56:11,032 generate_engines.py:172 INFO] Building engines for bert benchmark in Offline scenario...
[2023-07-12 18:56:11,057 bert_var_seqlen.py:67 INFO] Using workspace size: 0
[07/12/2023-18:56:11] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 38, GPU 928 (MiB)
[07/12/2023-18:56:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2981, GPU +750, now: CPU 3096, GPU 1680 (MiB)
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[07/12/2023-18:56:18] [TRT] [I] Using default for use_int8_scale_max: true
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
...
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[2023-07-12 18:56:18,733 bert_var_seqlen.py:215 INFO] Building ./build/engines/H100_PCIe_80GB_Custom/bert/Offline/bert-Offline-gpu-_S_384_B_0_P_0_vs.custom_k_99_MaxP.plan
[07/12/2023-18:56:18] [TRT] [E] 4: [network.cpp::validate::3036] Error Code 4: Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/work/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/work/code/actionhandler/generate_engines.py", line 175, in handle
total_engine_build_time += self.build_engine(job)
File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine
builder.build_engines()
File "/work/code/bert/tensorrt/bert_var_seqlen.py", line 231, in build_engines
assert engine is not None, "Engine Build Failed!"
AssertionError: Engine Build Failed!
- The system device id info is as follows: I have already set the system id as H100_PCIe_80GB_Custom
(mlperf) user@mlperf-inference-user-x86_64:/work$ python3 -m scripts.custom_systems.add_custom_system
This script creates a custom system definition within the MLPerf Inference codebase that matches the
hardware specifications of the system that it is run on. The script then does the following:
- Backs up NVIDIA's workload configuration files
- Creates new workload configuration files (configs/<Benchmark name>/<Scenario>/__init__.py) with dummy values
- The user should fill out these dummy values with the correct values
============= DETECTED SYSTEM ==============
SystemConfiguration:
System ID (Optional Alias): H100_PCIe_80GB_Custom
CPUConfiguration:
2x CPU (CPUArchitecture.x86_64): Intel(R) Xeon(R) Platinum 8480+
56 Cores, 2 Threads/Core
MemoryConfiguration: 528.08 GB (Matching Tolerance: 0.05)
AcceleratorConfiguration:
2x GPU (0x233110DE): NVIDIA H100 PCIe
AcceleratorType: Discrete
SM Compute Capability: 90
Memory Capacity: 79.65 GiB
Max Power Limit: 310.0 W
NUMA Config String: &
Thanks for any hint of this issue.
Here, "B0" means batch size used is 0 which is invalid.
We are supporting nvidia implementation inside CM and have currently tested on L4, T4, A100 and RTX 4090. We'll be very happy to assist you if you can test it on H100. Here, are the instructions: https://github.com/mlcommons/ck/blob/master/docs/mlperf/inference/bert/README_nvidia.md and public discord channel for any queries.