TensorRT 🐛 [Bug] torch.compile crashes with IndexError

Bug Description

IndexError: tuple index out of range is raised when attempting to compile an exported model.

To Reproduce

Steps to reproduce the behavior:

Use torch.export to export the model.
Compile the exported model to TensorRT
Attempt to run the model

exported_model = torch.export(model, example_inputs)
compiled = torch.compile(exported_model, backend="torch_tensorrt")
(...)
compiled(example_inputs)

results in

---------------------------------------------------------------------------
BackendCompilerFailed                     Traceback (most recent call last)
Cell In[10], line 16
---> 16     t, state = compiled(
     17         c,
     18         cl,
     19         state,
     20     )

File ~/accentia/env2/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:375, in OptimizedModule.__call__(self, *args, **kwargs)
    365 if torch.nn.modules.module._has_any_global_hook():
    366     warnings.warn(
    367         "Using `torch.compile(module)` when there are global hooks on "
    368         "modules (e.g., from `register_module_forward_hook`); this will"
   (...)    373         stacklevel=2,
    374     )
--> 375 return super().__call__(*args, **kwargs)

File ~/accentia/env2/lib/python3.11/site-packages/torch/nn/modules/module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs)
   1771     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1772 else:
-> 1773     return self._call_impl(*args, **kwargs)

File ~/accentia/env2/lib/python3.11/site-packages/torch/nn/modules/module.py:1784, in Module._call_impl(self, *args, **kwargs)
   1779 # If we don't have any hooks, we want to skip the rest of the logic in
   1780 # this function, and just call forward.
   1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1782         or _global_backward_pre_hooks or _global_backward_hooks
   1783         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1784     return forward_call(*args, **kwargs)
   1786 result = None
   1787 called_always_called_hooks = set()

File ~/accentia/env2/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:749, in _TorchDynamoContext.__call__.<locals>.compile_wrapper(*args, **kwargs)
    745     raise e.with_traceback(None) from e.__cause__  # User compiler error
    746 except ShortenTraceback as e:
    747     # Failures in the backend likely don't have useful
    748     # data in the TorchDynamo frames, so we strip them out.
--> 749     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    750 finally:
    751     # Restore the dynamic layer stack depth if necessary.
    752     set_eval_frame(None)

File ~/accentia/env2/lib/python3.11/site-packages/torch/_dynamo/output_graph.py:1871, in OutputGraph._call_user_compiler(self, gm, example_inputs)
   1869     raise e
   1870 except Exception as e:
-> 1871     raise BackendCompilerFailed(
   1872         self.compiler_fn, e, inspect.currentframe()
   1873     ).with_traceback(e.__traceback__) from None
   1875 signpost_event(
   1876     "dynamo",
   1877     "OutputGraph.call_user_compiler",
   (...)   1883     },
   1884 )
   1886 return compiled_fn

File ~/accentia/env2/lib/python3.11/site-packages/torch/_dynamo/output_graph.py:1846, in OutputGraph._call_user_compiler(self, gm, example_inputs)
   1844 if config.verify_correctness:
   1845     compiler_fn = WrapperBackend(compiler_fn)
-> 1846 compiled_fn = compiler_fn(gm, example_inputs)
   1847 _step_logger()(logging.INFO, f"done compiler function {name}")
   1848 assert callable(compiled_fn), "compiler_fn did not return callable"

File ~/accentia/env2/lib/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py:150, in WrapBackendDebug.__call__(self, gm, example_inputs, **kwargs)
    148             raise
    149 else:
--> 150     compiled_gm = compiler_fn(gm, example_inputs)
    152 return compiled_gm

File ~/accentia/env2/lib/python3.11/site-packages/torch/__init__.py:2425, in _TorchCompileWrapper.__call__(self, model_, inputs_)
   2424 def __call__(self, model_, inputs_):
-> 2425     return self.compiler_fn(model_, inputs_, **self.kwargs)

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/backend/backends.py:47, in torch_tensorrt_backend(gm, sample_inputs, **kwargs)
     43     set_log_level(logger.parent, logging.DEBUG)
     45 DEFAULT_BACKEND = aot_torch_tensorrt_aten_backend
---> 47 return DEFAULT_BACKEND(gm, sample_inputs, **kwargs)

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/backend/backends.py:96, in aot_torch_tensorrt_aten_backend(gm, sample_inputs, **kwargs)
     92 if settings.offload_module_to_cpu:
     93     logger.warning(
     94         "The offload_module_to_cpu option is set, but it is being ignored since the torch_compile backend does not support this feature"
     95     )
---> 96 return _pretraced_backend(gm, sample_inputs, settings, engine_cache)

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/backend/backends.py:164, in _pretraced_backend(gm, sample_inputs, settings, engine_cache)
    160         if settings.strip_engine_weights:
    161             logger.error(
    162                 "strip_engine_weights arg is not supported for torch.compile()"
    163             )
--> 164         trt_compiled = compile_module(
    165             gm,
    166             torchtrt_inputs,
    167             settings=settings,
    168             engine_cache=engine_cache,
    169         )
    170         return trt_compiled
    171 except (AssertionError, RuntimeError):

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/_compiler.py:759, in compile_module(gm, sample_arg_inputs, sample_kwarg_inputs, settings, engine_cache, _debugger_config)
    756 CONVERTERS.set_compilation_settings(settings)
    758 # Check the number of supported operations in the graph
--> 759 num_supported_ops, total_ops = partitioning.get_graph_converter_support(
    760     gm, settings.torch_executed_ops
    761 )
    763 dryrun_tracker.total_ops_in_graph = total_ops
    764 dryrun_tracker.supported_ops_in_graph = num_supported_ops

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/partitioning/common.py:196, in get_graph_converter_support(graph_module, torch_executed_ops)
    193     if node.op == "call_function":
    194         total_functional_nodes += 1
--> 196         if op_support.is_node_supported(module_dict, node):
    197             number_of_supported_nodes += 1
    199 # Print node support overview prior to partitioning

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/partitioning/_global_partitioner.py:153, in TorchTensorRTOperatorSupport.is_node_supported(self, submodules, node)
    147 def is_node_supported(
    148     self, submodules: Mapping[str, torch.nn.Module], node: torch.fx.Node
    149 ) -> bool:
    150     node_name = ConverterRegistry.qualified_name_or_str(node.target)
    152     if (
--> 153         (node in CONVERTERS or node.op == "get_attr")
    154         and node_name not in self.torch_executed_ops
    155         and node.target not in self.torch_executed_ops
    156     ):
    157         # If node is a proper, supported computational node, store the operator
    158         if not node.is_impure() and node.op != "get_attr":
    159             if node_name not in self.supported_operators:

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_ConverterRegistry.py:528, in ConverterRegistry.__contains__(self, key)
    525 try:
    526     # Attempt to access the item in the registry
    527     if isinstance(key, Node):
--> 528         self.__getitem__(key)
    529     else:
    530         self.__getitem_without_validation__(key)

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_ConverterRegistry.py:458, in ConverterRegistry.__getitem__(self, node)
    451 logger.debug(f"Converter options for {key}: {len(converters)}")
    452 for i, candidate in enumerate(converters):
    453     # We enable the converter under 4 conditions
    454     # 1) capability validator is True
    455     # 2) Assume dynamic_shape support is True
    456     # 3) Node only has static shaped inputs
    457     # 4) Node has dynamic inputs and the converter has supports_dynamic_shapes=True
--> 458     if is_valid := candidate.capability_validator(
    459         node, self.compilation_settings
    460     ) and (
    461         assume_dynamic_shape_support
    462         or not node_has_dynamic_shapes(node)
    463         or candidate.supports_dynamic_shapes
    464     ):
    465         logger.debug(
    466             f"Selecting converter option {i} for converting {key}"
    467         )
    468         return (
    469             candidate.converter_implementation,
    470             calling_convention,
   (...)    474             },
    475         )

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/aten_ops_converters.py:2661, in sort_validator(node, settings)
   2659     return False
   2660 shape = meta_data.shape
-> 2661 dim = node.args[1]
   2662 dim = get_positive_dim(dim, len(shape))
   2663 k = shape[dim]

BackendCompilerFailed: backend='torch_tensorrt' raised:
IndexError: tuple index out of range

Expected behavior

The compilation should work or crash with a better message.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version (e.g. 1.0.0): 2.8.0
PyTorch Version (e.g. 1.0): 2.8.0
CPU Architecture: x86
OS (e.g., Linux): Ubuntu 22.04
How you installed PyTorch (conda, pip, libtorch, source): uv pip
Build command you used (if compiling from source): using pip
Are you using local sources or building from archives: using pip
Python version: 3.11
CUDA version: 12.8
GPU models and configuration: H100
Any other relevant information:

Aug 14 '25 12:08 srdecny

The same error appears even if the model being compiled is not exported (e.g. the raw model is used for compilation).

Aug 14 '25 13:08 srdecny

Can you provide the model that is not compiling properly?

Aug 14 '25 20:08 narendasan

@narendasan The model is pretty large and proprietary, so not really. I've tried reducing the model to something that crashes, and I managed to isolate this:

import torch
import torch_tensorrt
import os

os.environ["TORCHDYNAMO_VERBOSE"] = "1"

def sequence_mask(lengths, max_length=None):
    if max_length is None:
        max_length = int(lengths.max().item())
    return torch.arange(max_length, device=lengths.device)[None, :] < lengths[:, None]

class Repro(torch.nn.Module):
    def forward(self, x, x_lens):
        mask_bt = sequence_mask(x_lens, x.shape[1])           # (B, T) bool
        mask_bth = (~mask_bt).unsqueeze(-1).expand_as(x)      # (B, T, H) bool
        x = x.clone()
        x[mask_bth] = 0.0                                     # in-place boolean assignment
        return x

if __name__ == "__main__":
    device = "cuda"
    B, T, H = 2, 16, 256
    x = torch.randn(B, T, H, device=device, dtype=torch.float16)
    x_lens = torch.tensor([T, T - 3], device=device, dtype=torch.long)

    # Works without compile
    m = Repro().to(device).to(torch.float16)
    _ = m(x, x_lens)
    
    compiled_full = torch.compile(m, fullgraph=True)
    _ = compiled_full(x, x_lens)

    # Fails when compiled with torch_tensorrt backend
    compiled_tensorrt = torch.compile(m, backend="torch_tensorrt")

crashes with

---------------------------------------------------------------------------
BackendCompilerFailed                     Traceback (most recent call last)
Cell In[1], line 35
     33 # Fails when compiled with torch_tensorrt backend
     34 compiled_tensorrt = torch.compile(m, backend="torch_tensorrt")
---> 35 _ = compiled_tensorrt(x, x_lens)

File ~/accentia/env2/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:375, in OptimizedModule.__call__(self, *args, **kwargs)
    365 if torch.nn.modules.module._has_any_global_hook():
    366     warnings.warn(
    367         "Using `torch.compile(module)` when there are global hooks on "
    368         "modules (e.g., from `register_module_forward_hook`); this will"
   (...)    373         stacklevel=2,
    374     )
--> 375 return super().__call__(*args, **kwargs)

File ~/accentia/env2/lib/python3.11/site-packages/torch/nn/modules/module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs)
   1771     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1772 else:
-> 1773     return self._call_impl(*args, **kwargs)

File ~/accentia/env2/lib/python3.11/site-packages/torch/nn/modules/module.py:1784, in Module._call_impl(self, *args, **kwargs)
   1779 # If we don't have any hooks, we want to skip the rest of the logic in
   1780 # this function, and just call forward.
   1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1782         or _global_backward_pre_hooks or _global_backward_hooks
   1783         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1784     return forward_call(*args, **kwargs)
   1786 result = None
   1787 called_always_called_hooks = set()

File ~/accentia/env2/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:749, in _TorchDynamoContext.__call__.<locals>.compile_wrapper(*args, **kwargs)
    745     raise e.with_traceback(None) from e.__cause__  # User compiler error
    746 except ShortenTraceback as e:
    747     # Failures in the backend likely don't have useful
    748     # data in the TorchDynamo frames, so we strip them out.
--> 749     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    750 finally:
    751     # Restore the dynamic layer stack depth if necessary.
    752     set_eval_frame(None)

File ~/accentia/env2/lib/python3.11/site-packages/torch/_dynamo/output_graph.py:1871, in OutputGraph._call_user_compiler(self, gm, example_inputs)
   1869     raise e
   1870 except Exception as e:
-> 1871     raise BackendCompilerFailed(
   1872         self.compiler_fn, e, inspect.currentframe()
   1873     ).with_traceback(e.__traceback__) from None
   1875 signpost_event(
   1876     "dynamo",
   1877     "OutputGraph.call_user_compiler",
   (...)   1883     },
   1884 )
   1886 return compiled_fn

File ~/accentia/env2/lib/python3.11/site-packages/torch/_dynamo/output_graph.py:1846, in OutputGraph._call_user_compiler(self, gm, example_inputs)
   1844 if config.verify_correctness:
   1845     compiler_fn = WrapperBackend(compiler_fn)
-> 1846 compiled_fn = compiler_fn(gm, example_inputs)
   1847 _step_logger()(logging.INFO, f"done compiler function {name}")
   1848 assert callable(compiled_fn), "compiler_fn did not return callable"

File ~/accentia/env2/lib/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py:150, in WrapBackendDebug.__call__(self, gm, example_inputs, **kwargs)
    148             raise
    149 else:
--> 150     compiled_gm = compiler_fn(gm, example_inputs)
    152 return compiled_gm

File ~/accentia/env2/lib/python3.11/site-packages/torch/__init__.py:2425, in _TorchCompileWrapper.__call__(self, model_, inputs_)
   2424 def __call__(self, model_, inputs_):
-> 2425     return self.compiler_fn(model_, inputs_, **self.kwargs)

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/backend/backends.py:47, in torch_tensorrt_backend(gm, sample_inputs, **kwargs)
     43     set_log_level(logger.parent, logging.DEBUG)
     45 DEFAULT_BACKEND = aot_torch_tensorrt_aten_backend
---> 47 return DEFAULT_BACKEND(gm, sample_inputs, **kwargs)

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/backend/backends.py:96, in aot_torch_tensorrt_aten_backend(gm, sample_inputs, **kwargs)
     92 if settings.offload_module_to_cpu:
     93     logger.warning(
     94         "The offload_module_to_cpu option is set, but it is being ignored since the torch_compile backend does not support this feature"
     95     )
---> 96 return _pretraced_backend(gm, sample_inputs, settings, engine_cache)

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/backend/backends.py:164, in _pretraced_backend(gm, sample_inputs, settings, engine_cache)
    160         if settings.strip_engine_weights:
    161             logger.error(
    162                 "strip_engine_weights arg is not supported for torch.compile()"
    163             )
--> 164         trt_compiled = compile_module(
    165             gm,
    166             torchtrt_inputs,
    167             settings=settings,
    168             engine_cache=engine_cache,
    169         )
    170         return trt_compiled
    171 except (AssertionError, RuntimeError):

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/_compiler.py:952, in compile_module(gm, sample_arg_inputs, sample_kwarg_inputs, settings, engine_cache, _debugger_config)
    950 # Create TRT engines from submodule
    951 if not settings.dryrun:
--> 952     trt_module = convert_module(
    953         submodule,
    954         submodule_inputs,
    955         settings=settings,
    956         name=name,
    957         engine_cache=engine_cache,
    958     )
    960     trt_modules[name] = trt_module
    962     if _debugger_config:

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_conversion.py:88, in convert_module(module, inputs, settings, name, engine_cache)
     71 def convert_module(
     72     module: torch.fx.GraphModule,
     73     inputs: Sequence[Input],
   (...)     76     engine_cache: Optional[BaseEngineCache] = None,
     77 ) -> PythonTorchTensorRTModule | TorchTensorRTModule:
     78     """Convert an FX module to a TRT module
     79     Args:
     80         module: FX GraphModule to convert
   (...)     86         PythonTorchTensorRTModule or TorchTensorRTModule
     87     """
---> 88     interpreter_result = interpret_module_to_result(
     89         module, inputs, settings, engine_cache=engine_cache
     90     )
     92     rt_cls = PythonTorchTensorRTModule
     94     if ENABLED_FEATURES.torch_tensorrt_runtime and not settings.use_python_runtime:

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_conversion.py:67, in interpret_module_to_result(module, inputs, settings, arg_inputs, kwarg_inputs, engine_cache)
     55 output_dtypes = infer_module_output_dtypes(
     56     module, truncate_double=settings.truncate_double
     57 )
     59 interpreter = TRTInterpreter(
     60     module,
     61     inputs,
   (...)     64     engine_cache=engine_cache,
     65 )
---> 67 interpreter_result = interpreter.run()
     68 return interpreter_result

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py:721, in TRTInterpreter.run(self, strict_type_constraints, algorithm_selector, tactic_sources)
    718             if interpreter_result is not None:  # hit the cache
    719                 return interpreter_result  # type: ignore[no-any-return]
--> 721 self._construct_trt_network_def()
    723 if not self.compilation_settings.immutable_weights:
    724     self._save_weight_mapping()

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py:399, in TRTInterpreter._construct_trt_network_def(self)
    397 self.input_specs_iter = 0
    398 run_module_start_time = datetime.now()
--> 399 super().run()
    400 _LOGGER.info(
    401     f"TRT INetwork construction elapsed time: {datetime.now() - run_module_start_time}"
    402 )

File ~/accentia/env2/lib/python3.11/site-packages/torch/fx/interpreter.py:173, in Interpreter.run(self, initial_env, enable_io_processing, *args)
    170     continue
    172 try:
--> 173     self.env[node] = self.run_node(node)
    174 except Exception as e:
    175     if self.extra_traceback:

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py:784, in TRTInterpreter.run_node(self, n)
    779 if _LOGGER.isEnabledFor(logging.DEBUG):
    780     _LOGGER.debug(
    781         f"Converting node {self._cur_node_name} (kind: {n.target}, args: {TRTInterpreter._args_str(n.args)})"
    782     )
--> 784 trt_node: torch.fx.Node = super().run_node(n)
    786 if n.op == "get_attr":
    787     self.const_mapping[str(n)] = (tuple(trt_node.shape), str(trt_node.dtype))

File ~/accentia/env2/lib/python3.11/site-packages/torch/fx/interpreter.py:242, in Interpreter.run_node(self, n)
    240 assert isinstance(args, tuple)
    241 assert isinstance(kwargs, dict)
--> 242 return getattr(self, n.op)(n.target, args, kwargs)

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py:891, in TRTInterpreter.call_function(self, target, args, kwargs)
    889     return converter(self.ctx.net, target, args, kwargs, self._cur_node_name)
    890 else:
--> 891     return converter(self.ctx, target, args, kwargs, self._cur_node_name)

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/converter_utils.py:681, in enforce_tensor_types.<locals>.wrapper.<locals>.convert_with_type_enforcement(ctx, target, args, kwargs, name)
    678     elif isinstance(index, str):
    679         new_kwargs[index] = new_value
--> 681 return func(ctx, target, new_args, new_kwargs, name)

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/aten_ops_converters.py:870, in aten_ops_index_put(ctx, target, args, kwargs, name)
    854 @dynamo_tensorrt_converter(
    855     torch.ops.aten.index_put.default,
    856 )
   (...)    868     name: str,
    869 ) -> Union[TRTTensor, Sequence[TRTTensor]]:
--> 870     return impl.select.index_put_converter(
    871         ctx,
    872         target,
    873         SourceIR.ATEN,
    874         name,
    875         args[0],
    876         args[1],
    877         args[2],
    878         args_bounds_check(args, 3, False),
    879     )

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/impl/select.py:524, in index_put_converter(ctx, target, source_ir, name, input_tensor, input_indices, values, accumulate)
    520     assert idx is not None
    521     idx_reshaped = impl.shuffle.reshape(
    522         ctx, target, source_ir, f"{name}_reshape_idx_I_{i}", idx, (idx.shape[0], 1)
    523     )
--> 524     expanded_idx = impl.slice.expand(
    525         ctx,
    526         target,
    527         source_ir,
    528         f"{name}_expand_idx_I_{i}",
    529         idx_reshaped,
    530         (N, F_volume),
    531     )
    532     I_tensors.append(expanded_idx)
    534 # Create a meshgrid for free dimensions (F)

File ~/accentia/env2/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/impl/slice/ops.py:228, in expand(ctx, target, source_ir, name, input_t, shape)
    219 def expand(
    220     ctx: ConversionContext,
    221     target: Target,
   (...)    225     shape: Shape,
    226 ) -> TRTTensor:
    227     shape_rank = len(shape)
--> 228     initial_tensor_rank = len(input_t.shape)
    230     # If the rank of the input tensor is less than the shape's rank, pad with ones
    231     if initial_tensor_rank < shape_rank:

BackendCompilerFailed: backend='torch_tensorrt' raised:
ValueError: __len__() should return >= 0

While executing %index_put : [num_users=1] = call_function[target=torch.ops.aten.index_put.default](args = (%clone_2, [%expand], %_frozen_param1), kwargs = {})
GraphModule: class GraphModule(torch.nn.Module):
    def forward(self, arg1_1: "i64[2][1]", arg0_1: "f16[2, 16, 256][4096, 256, 1]"):
        # No stacktrace found for following nodes
        slice_2: "i64[2][1]" = torch.ops.aten.slice.Tensor(arg1_1, 0, 0, 9223372036854775807);  arg1_1 = None
        unsqueeze_1: "i64[2, 1][1, 1]" = torch.ops.aten.unsqueeze.default(slice_2, 1);  slice_2 = None
        _frozen_param0: "i64[1, 16][16, 1]" = self._frozen_param0
        lt: "b8[2, 16][16, 1]" = torch.ops.aten.lt.Tensor(_frozen_param0, unsqueeze_1);  _frozen_param0 = unsqueeze_1 = None
        bitwise_not: "b8[2, 16][16, 1]" = torch.ops.aten.bitwise_not.default(lt);  lt = None
        unsqueeze_2: "b8[2, 16, 1][16, 1, 1]" = torch.ops.aten.unsqueeze.default(bitwise_not, -1);  bitwise_not = None
        expand: "b8[2, 16, 256][16, 1, 0]" = torch.ops.aten.expand.default(unsqueeze_2, [2, 16, 256]);  unsqueeze_2 = None
        clone_2: "f16[2, 16, 256][4096, 256, 1]" = torch.ops.aten.clone.default(arg0_1);  arg0_1 = None
        _frozen_param1: "f16[][]" = self._frozen_param1
        index_put: "f16[2, 16, 256][4096, 256, 1]" = torch.ops.aten.index_put.default(clone_2, [expand], _frozen_param1);  clone_2 = expand = _frozen_param1 = None
        return index_put
        

Original traceback:
None

I get a warning about TensorRT-LLM is not installed. Please install TensorRT-LLM or set TRTLLM_PLUGINS_PATH to the directory containing libnvinfer_plugin_tensorrt_llm.so to use converters for torch.distributed ops. I assume this is not relevant; I've tried to install it but ran into some issues with missing symbols in libs etc.
I am aware it's not the same error as reported originally, but the model does contain this masking logic.

Could this crash be relevant to the original error above, or is it something else?

Aug 15 '25 07:08 srdecny

I get a warning about TensorRT-LLM is not installed. Please install TensorRT-LLM or set TRTLLM_PLUGINS_PATH to the directory containing libnvinfer_plugin_tensorrt_llm.so to use converters for torch.distributed ops. I assume this is not relevant; I've tried to install it but ran into some issues with missing symbols in libs etc.

Yes this is not related

This may be a different bug, @apbose please investigate this. The original bug is due to a validator which determines if a node can converted to TensorRT. But the logs don't really give us the information we need (the node type and the schema of the node). I believe debug logging should be able to expose this information. @cehongwang can you provide an example on how to use the debugger?

Aug 15 '25 14:08 narendasan

Ok I will take a look

Aug 15 '25 15:08 apbose

@srdecny I need to look into this in more detail, but can you try something like below

def forward(self, x, x_lens):
        B, T, H = x.shape
        # create a float mask instead of boolean
        mask_bt = sequence_mask(x_lens, T).to(x.dtype)  # (B, T)
        mask_bt = mask_bt.unsqueeze(-1)                # (B, T, 1)
        mask_bt = mask_bt.expand(B, T, H)             # (B, T, H)
        
        # multiply instead of in-place boolean assignment
        x = x * mask_bt  # sets masked positions to zero
        return x

basically converting the mask to float and avoiding indexing?

Aug 20 '25 02:08 apbose