dynamic shape for slice converter
I have just added the test cases, and not refactored has_dynamic_shape and set_layer_name since they are already part of https://github.com/pytorch/TensorRT/pull/2853
- Can we perform slice on dynamic dimensions ?
- I'm facing slice layer conversion issues on llama2
Here's the reproducer :
- Please login via huggingface-cli (install it via pip install -U "huggingface_hub[cli]" ) https://huggingface.co/docs/huggingface_hub/en/guides/cli#huggingface-cli-login. The user access token can be accessed in your settings. Script:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch_tensorrt
llama_path = "meta-llama/Llama-2-7b-hf"
model = AutoModelForCausalLM.from_pretrained(llama_path).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(llama_path)
base_prompt = "How many hours are in a day?"
base_inputs = tokenizer(base_prompt, return_tensors="pt").to("cuda:0")
input_ids = base_inputs.input_ids
pyt_out = model(input_ids)
seq_len = torch.export.Dim("seq_len", min=2, max=1024)
from torch.nn.attention import SDPBackend
with torch.nn.attention.sdpa_kernel([SDPBackend.MATH]), torch.no_grad():
ep = torch.export.export(model, (input_ids,), dynamic_shapes=({1: seq_len},))
trt_model = torch_tensorrt.dynamo.compile(ep, inputs=[input_ids],
enabled_precisions={torch.float16},
min_block_size=1,
truncate_long_and_double=True,
debug=True)
The error is
DEBUG:torch_tensorrt.dynamo.conversion._TRTInterpreter:Converting node /model_layers_0_self_attn_rotary_emb/slice_3 (kind: aten.slice.Tensor, args
: ('[SHUFFLE]-[aten_ops.unsqueeze.default]-[/model_layers_0_self_attn_rotary_emb/unsqueeze_5]_output <tensorrt.ITensor [shape=(1, 1, -1), dtype=Da
taType.INT32]>', 2, 0, 9223372036854775807))
Traceback (most recent call last):
File "/work/TensorRT/llama2_auto.py", line 21, in <module>
enabled_precisions={torch.float16},
File "/work/TensorRT/py/torch_tensorrt/dynamo/_compiler.py", line 227, in compile
trt_gm = compile_module(gm, inputs, settings)
File "/work/TensorRT/py/torch_tensorrt/dynamo/_compiler.py", line 412, in compile_module
trt_module = convert_module(
File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/_conversion.py", line 106, in convert_module
interpreter_result = interpret_module_to_result(module, inputs, settings)
File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/_conversion.py", line 87, in interpret_module_to_result
interpreter_result = interpreter.run()
File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 309, in run
super().run()
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/fx/interpreter.py", line 145, in run
self.env[node] = self.run_node(node)
File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 348, in run_node
trt_node: torch.fx.Node = super().run_node(n)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/fx/interpreter.py", line 202, in run_node
return getattr(self, n.op)(n.target, args, kwargs)
File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 456, in call_function
return converter(self.ctx, target, args, kwargs, self._cur_node_name)
File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/converter_utils.py", line 400, in convert_with_type_enforcement
return func(ctx, target, new_args, new_kwargs, name)
File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py", line 769, in aten_ops_slice
return impl.slice.slice_op(
File "/work/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/slice/ops.py", line 50, in slice_op
assert input.shape[dim] != -1, "Can't slice on dynamic shape dimension!"
AssertionError: Can't slice on dynamic shape dimension!
While testing the llama model I get this error -
torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (seq_len)! For more information, run with TORCH_LOGS="+dynamic".
- Not all values of seq_len = L['input_ids'].size()[1] in the specified range seq_len <= 1024 satisfy the generated guard Ne(Mod(L['input_ids'].size()[1], 64), 0).
Could you provide me the path of the model here- llama_path = "meta-llama/Llama-2-7b-hf"
Hi @peri044 the code now goes past the above error points and also the python int overflow error. However I see this error now
Traceback (most recent call last):
File "/code/torch_tensorrt/TensorRT/tests/py/dynamo/conversion/llama.py", line 20, in <module>
trt_model = torch_tensorrt.dynamo.compile(ep, inputs=[input_ids],
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 227, in compile
trt_gm = compile_module(gm, inputs, settings)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 365, in compile_module
submodule_inputs = partitioning.construct_submodule_inputs(submodule)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/partitioning/common.py", line 116, in construct_submodule_inputs
raise AssertionError(
AssertionError: Input scaled_dot_product_attention does not contain metadata. Please ensure you have exported the graph correctly
Oddly this comes after the engine creation debug prints here- INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:11.626404 INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 363528004 bytes of Memory
The above error comes in create_submodule_inputs which I believe should come before engine creation. Is it because of multi gpu setting?- WARNING: [Torch-TensorRT] - Detected this engine is being instantitated in a multi-GPU system with multi-device safe mode disabled.
Attached the logs for the same.[
llama_out.log
](url)
@apbose I'm seeing the following error
File "/home/dperi/Downloads/TensorRT/py/torch_tensorrt/dynamo/conversion/converter_utils.py", line 530, in convert_with_type_enforcement
return func(ctx, target, new_args, new_kwargs, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dperi/Downloads/TensorRT/py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py", line 884, in aten_ops_slice
return impl.slice.slice_op(
^^^^^^^^^^^^^^^^^^^^
File "/home/dperi/Downloads/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/slice/ops.py", line 59, in slice_op
stop_slice[dim] = stop
~~~~~~~~~~^^^^^
TypeError: __setitem__(): incompatible function arguments. The following argument types are supported:
1. (self: tensorrt_bindings.tensorrt.Dims, arg0: int, arg1: int) -> None
2. (self: tensorrt_bindings.tensorrt.Dims, arg0: slice, arg1: tensorrt_bindings.tensorrt.Dims) -> None
Invoked with: (1, 1, 1024, 1024), 2, <tensorrt_bindings.tensorrt.ITensor object at 0x73495841e030>
While executing %slice_3 : [num_users=2] = call_function[target=torch.ops.aten.slice.Tensor](args = (%_frozen_param1, 2, 0, %sym_size_int_1), kwargs = {_itensor_to_tensor_meta: {<tensorrt_bindings.tensorrt.ITensor object at 0x7349584b9db0>: ((1, s0), torch.int64, False, (s0, 1), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x734958ba4770>: None, <tensorrt_bindings.tensorrt.ITensor object at 0x7349585498f0>: None,
To reproduce:
- Please build torch_tensorrt using
llm_examples_mainbranch. - Run this example : https://github.com/pytorch/TensorRT/blob/llm_examples_main/examples/dynamo/torch_export_gpt2.py and also comment out
torch_executed_opsline : https://github.com/pytorch/TensorRT/blob/llm_examples_main/examples/dynamo/torch_export_gpt2.py#L70 Let me know if you're able to reproduce this error.
Hi @peri044 the code now goes past the above error points and also the python int overflow error. However I see this error now
Traceback (most recent call last): File "/code/torch_tensorrt/TensorRT/tests/py/dynamo/conversion/llama.py", line 20, in <module> trt_model = torch_tensorrt.dynamo.compile(ep, inputs=[input_ids], File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 227, in compile trt_gm = compile_module(gm, inputs, settings) File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 365, in compile_module submodule_inputs = partitioning.construct_submodule_inputs(submodule) File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch_tensorrt/dynamo/partitioning/common.py", line 116, in construct_submodule_inputs raise AssertionError( AssertionError: Input scaled_dot_product_attention does not contain metadata. Please ensure you have exported the graph correctlyOddly this comes after the engine creation debug prints here- INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:11.626404 INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 363528004 bytes of Memory
The above error comes in
create_submodule_inputswhich I believe should come before engine creation. Is it because of multi gpu setting?- WARNING: [Torch-TensorRT] - Detected this engine is being instantitated in a multi-GPU system with multi-device safe mode disabled. Attached the logs for the same.[ llama_out.log ](url)
Hello, I got the same error with you
AssertionError: Input scaled_dot_product_attention does not contain metadata. Please ensure you have exported the graph correctly
How did you solve it?