TensorRT
TensorRT copied to clipboard
❓ [Question] Unexpected exception _Map_base::at during PTQ
❓ Question
I am attempting to execute PTQ. During the compiling process, I get the following exception:
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Finalize: %142 : Tensor = aten::matmul(%x, %143) # /fsx_home/homes/srdecny/meaning/vocoder/hifigan/hifigan/vec2enc.py:84:0 Set kernel index: 5
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Total number of generated kernels selected for the engine: 7
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 0 CASK_STATIC
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 1 CASK_STATIC
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 2 CASK_STATIC
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 3 TRT_SERIALIZABLE:generatedNativePointwise
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 4 TRT_SERIALIZABLE:generatedNativePointwise
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 5 CASK_STATIC
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 6 CASK_STATIC
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Disabling unused tactic source: EDGE_MASK_CONVOLUTIONS
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Disabling unused tactic source: JIT_CONVOLUTIONS
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Engine generation completed in 1.64955 seconds.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Total per-runner device persistent memory is 0
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Total per-runner host persistent memory is 73616
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Allocated activation device memory of size 33692160
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +32, now: CPU 0, GPU 888 (MiB)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - CUDA lazy loading is enabled.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Calculating Maxima
INFO: [Torch-TensorRT TorchScript Conversion Context] - Starting Calibration.
INFO: [Torch-TensorRT TorchScript Conversion Context] - Post Processing Calibration data in 8.6e-07 seconds.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Assigning tensor scales: (Unnamed Layer* 164) [Concatenation]_output using (Unnamed Layer* 164) [Concatenation]_output [
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 1: Unexpected exception _Map_base::at
Traceback (most recent call last):
File "/fsx_home/homes/srdecny/meaning/vojta_notebooks/trt_quant_single_v1.py", line 435, in <module>
quanted = trt_decoder = torch_tensorrt.compile(
^^^^^^^^^^^^^^^^^^^^^^^
File "/fsx_home/homes/srdecny/meaning/env_bender6_3.11/lib/python3.11/site-packages/torch_tensorrt/_compile.py", line 185, in compile
compiled_ts_module: torch.jit.ScriptModule = torchscript_compile(
^^^^^^^^^^^^^^^^^^^^
File "/fsx_home/homes/srdecny/meaning/env_bender6_3.11/lib/python3.11/site-packages/torch_tensorrt/ts/_compiler.py", line 151, in compile
compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [Error thrown at core/conversion/conversionctx/ConversionCtx.cpp:169] Building serialized network failed in TensorRT
I don't really know how to proceed from here. What does this exception indicate?
The compiling code is roughly this:
calibrator = torch_tensorrt.ptq.DataLoaderCalibrator(
dloader,
cache_file="./encoder_calibrator.cache",
use_cache=False,
algo_type=torch_tensorrt.ptq.CalibrationAlgo.ENTROPY_CALIBRATION_2,
device=DEVICE
)
inputs = model.dummy_inputs()
trace = torch.jit.trace(model, inputs, check_trace=False, strict=False)
signature = torch_tensorrt.Input(shape=inputs.shape, dtype=inputs.dtype)
torch_tensorrt.compile(
trace,
input_signature=signature,
enabled_precisions={torch.float, torch.int8, torch.half},
calibrator=calibrator,
truncate_long_and_double=True,
)
inputs is a single float Tensor (although very large). Unfortunately, I can't share the model.
What you have already tried
All I managed to find online was this issue where somone indicates that the calibration dataloader might be empty. However, the following runs without any exception:
dummy_inputs = model.dummy_inputs()
trace = torch.jit.trace(model, inputs, check_trace=False, strict=False)
trace(dummy_inputs) # the traced model still works
for input in dloader:
trace(input) # the model also works with batches from the calibration dataloader
Additonally, running the compilation with torch_tensorrt.debug() shows TensorRT's logs with INFO: [Torch-TensorRT TorchScript Conversion Context] - Calibrated batch 0 in 0.473829 seconds., indicating the dataloader itself is fine.
References to the layer where it errors out in the compilation logs are:
DEBUG: [Torch-TensorRT] - ITensor name: (Unnamed Layer* 164) [Concatenation]_output
DEBUG: [Torch-TensorRT] - ITensor shape: [1, 4, 1, 1025]
DEBUG: [Torch-TensorRT] - ITensor type: Float32
DEBUG: [Torch-TensorRT] - Output tensor shape: [1, 4, 1]
<...>
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Eliminating concatenation (Unnamed Layer* 164) [Concatenation]
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Generating copy for (Unnamed Layer* 160) [Shuffle]_output to (Unnamed Layer* 164) [Concatenation]_output because copy elision is disabled for concat.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Generating copy for (Unnamed Layer* 161) [Shuffle]_output to (Unnamed Layer* 164) [Concatenation]_output because copy elision is disabled for concat.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Generating copy for (Unnamed Layer* 162) [Shuffle]_output to (Unnamed Layer* 164) [Concatenation]_output because copy elision is disabled for concat.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Generating copy for (Unnamed Layer* 163) [Shuffle]_output to (Unnamed Layer* 164) [Concatenation]_output because copy elision is disabled for concat.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After concat removal: 161 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After tensor merging: 161 layers
I can provide the full compilation log, if needed.
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- PyTorch Version (e.g., 1.0): 2.2.1
- CPU Architecture: x86
- OS (e.g., Linux): Ubuntu 22.04
- How you installed PyTorch (
conda,pip,libtorch, source): pip - Python version: 3.11.8
- CUDA version: 12.1
- GPU models and configuration: G5.2xlarge on AWS
tensorrt 8.6.1.post1
tensorrt-bindings 8.6.1
tensorrt-libs 8.6.1
torch_tensorrt 2.2.0