inference_results_v3.0
inference_results_v3.0 copied to clipboard
NVIDIA RuntimeError: FP8 weight is not found in dir /work/build/models/bert/fp8/faster-transformer-bert-fp8-weights-scales
- To reproduce the problem
After modifying use_fp8 from False to True as follow:
@ConfigRegistry.register(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
class H100_PCIE_80GB_CUSTOM(OfflineGPUBaseConfig):
system = KnownSystem.H100_PCIe_80GB_Custom
# Applicable fields for this benchmark are listed below. Not all of these are necessary, and some may be defined in the BaseConfig already and inherited.
# Please see NVIDIA's submission config files for example values and which fields to keep.
# Required fields (Must be set or inherited to run):
gpu_batch_size: int = 0
input_dtype: str = ''
...
use_fp8: bool = True # the default is False
run cmd make generate_engines RUN_ARGS="--benchmarks=bert --scenarios=offline"
will lead to the error of RuntimeError: FP8 weight is not found in dir
. The detailed error info is as follows:
[2023-07-12 18:57:23,967 main.py:231 INFO] Detected system ID: KnownSystem.H100_PCIe_80GB_Custom
[2023-07-12 18:57:26,192 generate_engines.py:172 INFO] Building engines for bert benchmark in Offline scenario...
Loading TensorRT plugin from build/plugins/../FasterTransformer/build/lib/libbert_fp8_plugin.so
[2023-07-12 18:57:26,220 bert_var_seqlen.py:67 INFO] Using workspace size: 0
[07/12/2023-18:57:26] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 38, GPU 928 (MiB)
[07/12/2023-18:57:32] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2981, GPU +750, now: CPU 3096, GPU 1680 (MiB)
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/work/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/work/code/actionhandler/generate_engines.py", line 175, in handle
total_engine_build_time += self.build_engine(job)
File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine
builder.build_engines()
File "/work/code/bert/tensorrt/bert_var_seqlen.py", line 210, in build_engines
bert_squad_fp8_fastertransfomer(network, weights_dict, self.bert_config, self.seq_len)
File "/work/code/bert/tensorrt/fp8_builder_fastertransformer.py", line 49, in bert_squad_fp8_fastertransfomer
raise RuntimeError(f"FP8 weight is not found in dir {weightDirPath}, Exiting...")
RuntimeError: FP8 weight is not found in dir /work/build/models/bert/fp8/faster-transformer-bert-fp8-weights-scales/, Exiting...
[2023-07-12 18:57:36,206 generate_engines.py:172 INFO] Building engines for bert benchmark in Offline scenario...
Loading TensorRT plugin from build/plugins/../FasterTransformer/build/lib/libbert_fp8_plugin.so
- SystemID setup
SystemID is set as
H100_PCIe_80GB_Custom
using cmdpython3 -m scripts.custom_systems.add_custom_system
This script creates a custom system definition within the MLPerf Inference codebase that matches the
hardware specifications of the system that it is run on. The script then does the following:
- Backs up NVIDIA's workload configuration files
- Creates new workload configuration files (configs/<Benchmark name>/<Scenario>/__init__.py) with dummy values
- The user should fill out these dummy values with the correct values
============= DETECTED SYSTEM ==============
SystemConfiguration:
System ID (Optional Alias): H100_PCIe_80GB_Custom
CPUConfiguration:
2x CPU (CPUArchitecture.x86_64): Intel(R) Xeon(R) Platinum 8480+
56 Cores, 2 Threads/Core
MemoryConfiguration: 528.08 GB (Matching Tolerance: 0.05)
AcceleratorConfiguration:
2x GPU (0x233110DE): NVIDIA H100 PCIe
AcceleratorType: Discrete
SM Compute Capability: 90
Memory Capacity: 79.65 GiB
Max Power Limit: 310.0 W
NUMA Config String: &
============================================