inference_results_v3.0 icon indicating copy to clipboard operation
inference_results_v3.0 copied to clipboard

NVIDIA make generate_engines Error Code 4: Internal Error Network has dynamic or shape inputs

Open wohenniubi opened this issue 2 years ago • 1 comments

  • Run generate_engines cmd make generate_engines RUN_ARGS="--benchmarks=bert --scenarios=offline" will lead to the error Error Code 4: Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)

The detailed error is as follows:

(mlperf) user@mlperf-inference-user-x86_64:/work$ make generate_engines RUN_ARGS="--benchmarks=bert --scenarios=offline"
[2023-07-12 18:56:08,807 main.py:231 INFO] Detected system ID: KnownSystem.H100_PCIe_80GB_Custom
[2023-07-12 18:56:11,032 generate_engines.py:172 INFO] Building engines for bert benchmark in Offline scenario...
[2023-07-12 18:56:11,057 bert_var_seqlen.py:67 INFO] Using workspace size: 0
[07/12/2023-18:56:11] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 38, GPU 928 (MiB)
[07/12/2023-18:56:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2981, GPU +750, now: CPU 3096, GPU 1680 (MiB)
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[07/12/2023-18:56:18] [TRT] [I] Using default for use_int8_scale_max: true
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
...
[07/12/2023-18:56:18] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[2023-07-12 18:56:18,733 bert_var_seqlen.py:215 INFO] Building ./build/engines/H100_PCIe_80GB_Custom/bert/Offline/bert-Offline-gpu-_S_384_B_0_P_0_vs.custom_k_99_MaxP.plan
[07/12/2023-18:56:18] [TRT] [E] 4: [network.cpp::validate::3036] Error Code 4: Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/work/code/actionhandler/base.py", line 189, in subprocess_target
    return self.action_handler.handle()
  File "/work/code/actionhandler/generate_engines.py", line 175, in handle
    total_engine_build_time += self.build_engine(job)
  File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine
    builder.build_engines()
  File "/work/code/bert/tensorrt/bert_var_seqlen.py", line 231, in build_engines
    assert engine is not None, "Engine Build Failed!"
AssertionError: Engine Build Failed!

image

  • The system device id info is as follows: I have already set the system id as H100_PCIe_80GB_Custom
(mlperf) user@mlperf-inference-user-x86_64:/work$  python3 -m scripts.custom_systems.add_custom_system
This script creates a custom system definition within the MLPerf Inference codebase that matches the
hardware specifications of the system that it is run on. The script then does the following:

    - Backs up NVIDIA's workload configuration files
    - Creates new workload configuration files (configs/<Benchmark name>/<Scenario>/__init__.py) with dummy values
        - The user should fill out these dummy values with the correct values

============= DETECTED SYSTEM ==============

SystemConfiguration:
    System ID (Optional Alias): H100_PCIe_80GB_Custom
    CPUConfiguration:
        2x CPU (CPUArchitecture.x86_64): Intel(R) Xeon(R) Platinum 8480+
            56 Cores, 2 Threads/Core
    MemoryConfiguration: 528.08 GB (Matching Tolerance: 0.05)
    AcceleratorConfiguration:
        2x GPU (0x233110DE): NVIDIA H100 PCIe
            AcceleratorType: Discrete
            SM Compute Capability: 90
            Memory Capacity: 79.65 GiB
            Max Power Limit: 310.0 W
    NUMA Config String: &

image

Thanks for any hint of this issue.

wohenniubi avatar Jul 12 '23 19:07 wohenniubi

Here, "B0" means batch size used is 0 which is invalid.

We are supporting nvidia implementation inside CM and have currently tested on L4, T4, A100 and RTX 4090. We'll be very happy to assist you if you can test it on H100. Here, are the instructions: https://github.com/mlcommons/ck/blob/master/docs/mlperf/inference/bert/README_nvidia.md and public discord channel for any queries.

arjunsuresh avatar Jul 21 '23 20:07 arjunsuresh