TensorRT dynamic shape for slice converter

I have just added the test cases, and not refactored has_dynamic_shape and set_layer_name since they are already part of https://github.com/pytorch/TensorRT/pull/2853

Jun 07 '24 22:06 apbose

Can we perform slice on dynamic dimensions ?
I'm facing slice layer conversion issues on llama2

Here's the reproducer :

Please login via huggingface-cli (install it via pip install -U "huggingface_hub[cli]" ) https://huggingface.co/docs/huggingface_hub/en/guides/cli#huggingface-cli-login. The user access token can be accessed in your settings. Script:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch_tensorrt

llama_path = "meta-llama/Llama-2-7b-hf"
model = AutoModelForCausalLM.from_pretrained(llama_path).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(llama_path)

base_prompt = "How many hours are in a day?"
base_inputs = tokenizer(base_prompt, return_tensors="pt").to("cuda:0")
input_ids = base_inputs.input_ids
pyt_out = model(input_ids)

seq_len = torch.export.Dim("seq_len", min=2, max=1024)

from torch.nn.attention import SDPBackend
with torch.nn.attention.sdpa_kernel([SDPBackend.MATH]), torch.no_grad():
    ep = torch.export.export(model, (input_ids,), dynamic_shapes=({1: seq_len},))

trt_model = torch_tensorrt.dynamo.compile(ep, inputs=[input_ids],
                                      enabled_precisions={torch.float16}, 
                                      min_block_size=1, 
                                      truncate_long_and_double=True, 
                                      debug=True)

The error is

DEBUG:torch_tensorrt.dynamo.conversion._TRTInterpreter:Converting node /model_layers_0_self_attn_rotary_emb/slice_3 (kind: aten.slice.Tensor, args
: ('[SHUFFLE]-[aten_ops.unsqueeze.default]-[/model_layers_0_self_attn_rotary_emb/unsqueeze_5]_output <tensorrt.ITensor [shape=(1, 1, -1), dtype=Da
taType.INT32]>', 2, 0, 9223372036854775807))                                                                                                      
Traceback (most recent call last):                                                                                                                
  File "/work/TensorRT/llama2_auto.py", line 21, in <module>                                                                                      
    enabled_precisions={torch.float16},                                                                                                           
  File "/work/TensorRT/py/torch_tensorrt/dynamo/_compiler.py", line 227, in compile                                                               
    trt_gm = compile_module(gm, inputs, settings)                                                                                                 
  File "/work/TensorRT/py/torch_tensorrt/dynamo/_compiler.py", line 412, in compile_module                                                        
    trt_module = convert_module(                                                                                                                  
  File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/_conversion.py", line 106, in convert_module                                           
    interpreter_result = interpret_module_to_result(module, inputs, settings)                                                                     
  File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/_conversion.py", line 87, in interpret_module_to_result                                
    interpreter_result = interpreter.run()                                                                                                        
  File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 309, in run                                                  
    super().run()                                                                                                                                 
  File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/fx/interpreter.py", line 145, in run                                     
    self.env[node] = self.run_node(node)                                                                                                          
  File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 348, in run_node                                             
    trt_node: torch.fx.Node = super().run_node(n)                                                                                                 
  File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/fx/interpreter.py", line 202, in run_node                                
    return getattr(self, n.op)(n.target, args, kwargs)                                                                                            
  File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 456, in call_function                                        
    return converter(self.ctx, target, args, kwargs, self._cur_node_name)                                                                         
  File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/converter_utils.py", line 400, in convert_with_type_enforcement                        
    return func(ctx, target, new_args, new_kwargs, name)                                                                                          
  File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py", line 769, in aten_ops_slice                                   
    return impl.slice.slice_op(                                                                                                                   
  File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/slice/ops.py", line 50, in slice_op                                               
    assert input.shape[dim] != -1, "Can't slice on dynamic shape dimension!"                                                                      
AssertionError: Can't slice on dynamic shape dimension!

Jun 12 '24 00:06 peri044

While testing the llama model I get this error -

torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (seq_len)! For more information, run with TORCH_LOGS="+dynamic".
  - Not all values of seq_len = L['input_ids'].size()[1] in the specified range seq_len <= 1024 satisfy the generated guard Ne(Mod(L['input_ids'].size()[1], 64), 0).

Could you provide me the path of the model here- llama_path = "meta-llama/Llama-2-7b-hf"

Jun 14 '24 23:06 apbose

Hi @peri044 the code now goes past the above error points and also the python int overflow error. However I see this error now

Traceback (most recent call last):
  File "/code/torch_tensorrt/TensorRT/tests/py/dynamo/conversion/llama.py", line 20, in <module>
    trt_model = torch_tensorrt.dynamo.compile(ep, inputs=[input_ids],
  File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 227, in compile
    trt_gm = compile_module(gm, inputs, settings)
  File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 365, in compile_module
    submodule_inputs = partitioning.construct_submodule_inputs(submodule)
  File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/partitioning/common.py", line 116, in construct_submodule_inputs
    raise AssertionError(
AssertionError: Input scaled_dot_product_attention does not contain metadata. Please ensure you have exported the graph correctly

Oddly this comes after the engine creation debug prints here- INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:11.626404 INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 363528004 bytes of Memory

The above error comes in create_submodule_inputs which I believe should come before engine creation. Is it because of multi gpu setting?- WARNING: [Torch-TensorRT] - Detected this engine is being instantitated in a multi-GPU system with multi-device safe mode disabled. Attached the logs for the same.[ llama_out.log ](url)

Jun 26 '24 01:06 apbose

@apbose I'm seeing the following error

File "/home/dperi/Downloads/TensorRT/py/torch_tensorrt/dynamo/conversion/converter_utils.py", line 530, in convert_with_type_enforcement
    return func(ctx, target, new_args, new_kwargs, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dperi/Downloads/TensorRT/py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py", line 884, in aten_ops_slice
    return impl.slice.slice_op(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/dperi/Downloads/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/slice/ops.py", line 59, in slice_op
    stop_slice[dim] = stop
    ~~~~~~~~~~^^^^^
TypeError: __setitem__(): incompatible function arguments. The following argument types are supported:
    1. (self: tensorrt_bindings.tensorrt.Dims, arg0: int, arg1: int) -> None
    2. (self: tensorrt_bindings.tensorrt.Dims, arg0: slice, arg1: tensorrt_bindings.tensorrt.Dims) -> None

Invoked with: (1, 1, 1024, 1024), 2, <tensorrt_bindings.tensorrt.ITensor object at 0x73495841e030>

While executing %slice_3 : [num_users=2] = call_function[target=torch.ops.aten.slice.Tensor](args = (%_frozen_param1, 2, 0, %sym_size_int_1), kwargs = {_itensor_to_tensor_meta: {<tensorrt_bindings.tensorrt.ITensor object at 0x7349584b9db0>: ((1, s0), torch.int64, False, (s0, 1), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x734958ba4770>: None, <tensorrt_bindings.tensorrt.ITensor object at 0x7349585498f0>: None,

To reproduce:

Please build torch_tensorrt using llm_examples_main branch.
Run this example : https://github.com/pytorch/TensorRT/blob/llm_examples_main/examples/dynamo/torch_export_gpt2.py and also comment out torch_executed_ops line : https://github.com/pytorch/TensorRT/blob/llm_examples_main/examples/dynamo/torch_export_gpt2.py#L70 Let me know if you're able to reproduce this error.

Jun 26 '24 21:06 peri044

Hi @peri044 the code now goes past the above error points and also the python int overflow error. However I see this error now
Traceback (most recent call last):
  File "/code/torch_tensorrt/TensorRT/tests/py/dynamo/conversion/llama.py", line 20, in <module>
    trt_model = torch_tensorrt.dynamo.compile(ep, inputs=[input_ids],
  File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 227, in compile
    trt_gm = compile_module(gm, inputs, settings)
  File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 365, in compile_module
    submodule_inputs = partitioning.construct_submodule_inputs(submodule)
  File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/partitioning/common.py", line 116, in construct_submodule_inputs
    raise AssertionError(
AssertionError: Input scaled_dot_product_attention does not contain metadata. Please ensure you have exported the graph correctly
Oddly this comes after the engine creation debug prints here- INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:11.626404 INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 363528004 bytes of Memory

The above error comes in create_submodule_inputs which I believe should come before engine creation. Is it because of multi gpu setting?- WARNING: [Torch-TensorRT] - Detected this engine is being instantitated in a multi-GPU system with multi-device safe mode disabled. Attached the logs for the same.[ llama_out.log ](url)

Hello， I got the same error with you

AssertionError: Input scaled_dot_product_attention does not contain metadata. Please ensure you have exported the graph correctly

How did you solve it?

Jul 19 '24 03:07 Hukongtao