TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

❓ [Question] Unexpected exception _Map_base::at during PTQ

Open srdecny opened this issue 1 year ago • 0 comments

❓ Question

I am attempting to execute PTQ. During the compiling process, I get the following exception:

DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Finalize: %142 : Tensor = aten::matmul(%x, %143) # /fsx_home/homes/srdecny/meaning/vocoder/hifigan/hifigan/vec2enc.py:84:0 Set kernel index: 5
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Total number of generated kernels selected for the engine: 7
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 0 CASK_STATIC
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 1 CASK_STATIC
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 2 CASK_STATIC
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 3 TRT_SERIALIZABLE:generatedNativePointwise
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 4 TRT_SERIALIZABLE:generatedNativePointwise
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 5 CASK_STATIC
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Kernel: 6 CASK_STATIC
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Disabling unused tactic source: EDGE_MASK_CONVOLUTIONS
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Disabling unused tactic source: JIT_CONVOLUTIONS
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Engine generation completed in 1.64955 seconds.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Total per-runner device persistent memory is 0
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Total per-runner host persistent memory is 73616
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Allocated activation device memory of size 33692160
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +32, now: CPU 0, GPU 888 (MiB)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - CUDA lazy loading is enabled.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Calculating Maxima
INFO: [Torch-TensorRT TorchScript Conversion Context] - Starting Calibration.
INFO: [Torch-TensorRT TorchScript Conversion Context] -   Post Processing Calibration data in 8.6e-07 seconds.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Assigning tensor scales: (Unnamed Layer* 164) [Concatenation]_output using (Unnamed Layer* 164) [Concatenation]_output [
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 1: Unexpected exception _Map_base::at
Traceback (most recent call last):
  File "/fsx_home/homes/srdecny/meaning/vojta_notebooks/trt_quant_single_v1.py", line 435, in <module>
    quanted = trt_decoder = torch_tensorrt.compile(
                            ^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx_home/homes/srdecny/meaning/env_bender6_3.11/lib/python3.11/site-packages/torch_tensorrt/_compile.py", line 185, in compile
    compiled_ts_module: torch.jit.ScriptModule = torchscript_compile(
                                                 ^^^^^^^^^^^^^^^^^^^^
  File "/fsx_home/homes/srdecny/meaning/env_bender6_3.11/lib/python3.11/site-packages/torch_tensorrt/ts/_compiler.py", line 151, in compile
    compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [Error thrown at core/conversion/conversionctx/ConversionCtx.cpp:169] Building serialized network failed in TensorRT

I don't really know how to proceed from here. What does this exception indicate?

The compiling code is roughly this:

calibrator = torch_tensorrt.ptq.DataLoaderCalibrator(
    dloader,
    cache_file="./encoder_calibrator.cache",
    use_cache=False,
    algo_type=torch_tensorrt.ptq.CalibrationAlgo.ENTROPY_CALIBRATION_2,
    device=DEVICE
)

inputs = model.dummy_inputs()
trace = torch.jit.trace(model, inputs, check_trace=False, strict=False)
signature = torch_tensorrt.Input(shape=inputs.shape, dtype=inputs.dtype)

torch_tensorrt.compile(
    trace,
    input_signature=signature,
    enabled_precisions={torch.float, torch.int8, torch.half},
    calibrator=calibrator,
    truncate_long_and_double=True,
)

inputs is a single float Tensor (although very large). Unfortunately, I can't share the model.

What you have already tried

All I managed to find online was this issue where somone indicates that the calibration dataloader might be empty. However, the following runs without any exception:

dummy_inputs = model.dummy_inputs()
trace = torch.jit.trace(model, inputs, check_trace=False, strict=False)

trace(dummy_inputs) # the traced model still works
for input in dloader:
    trace(input) # the model also works with batches from the calibration dataloader

Additonally, running the compilation with torch_tensorrt.debug() shows TensorRT's logs with INFO: [Torch-TensorRT TorchScript Conversion Context] - Calibrated batch 0 in 0.473829 seconds., indicating the dataloader itself is fine.

References to the layer where it errors out in the compilation logs are:

DEBUG: [Torch-TensorRT] - ITensor name: (Unnamed Layer* 164) [Concatenation]_output
DEBUG: [Torch-TensorRT] - ITensor shape: [1, 4, 1, 1025]
DEBUG: [Torch-TensorRT] - ITensor type: Float32
DEBUG: [Torch-TensorRT] - Output tensor shape: [1, 4, 1]
<...>
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Eliminating concatenation (Unnamed Layer* 164) [Concatenation]
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Generating copy for (Unnamed Layer* 160) [Shuffle]_output to (Unnamed Layer* 164) [Concatenation]_output because copy elision is disabled for concat.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Generating copy for (Unnamed Layer* 161) [Shuffle]_output to (Unnamed Layer* 164) [Concatenation]_output because copy elision is disabled for concat.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Generating copy for (Unnamed Layer* 162) [Shuffle]_output to (Unnamed Layer* 164) [Concatenation]_output because copy elision is disabled for concat.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Generating copy for (Unnamed Layer* 163) [Shuffle]_output to (Unnamed Layer* 164) [Concatenation]_output because copy elision is disabled for concat.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After concat removal: 161 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After tensor merging: 161 layers

I can provide the full compilation log, if needed.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • PyTorch Version (e.g., 1.0): 2.2.1
  • CPU Architecture: x86
  • OS (e.g., Linux): Ubuntu 22.04
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Python version: 3.11.8
  • CUDA version: 12.1
  • GPU models and configuration: G5.2xlarge on AWS
tensorrt                  8.6.1.post1
tensorrt-bindings         8.6.1
tensorrt-libs             8.6.1
torch_tensorrt            2.2.0

srdecny avatar Apr 26 '24 18:04 srdecny