System Info
- 2x H100 80GB on docker container (nvidia/cuda:12.4.1-devel-ubuntu22.04)
- last version of the library
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [x] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)
Reproduction
- Installation of tensorrt_llm as described in https://nvidia.github.io/TensorRT-LLM/installation/linux.html
- download of llava-v1.6-34b-hf model as described in https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/multimodal/README.md
- run the script python3 build_visual_engine.py --model_path tmp/hf_models/${MODEL_NAME} --model_type llava_next --model_path tmp/hf_models/${MODEL_NAME} --max_batch_size 5
Expected behavior
conversion of the visual encoder in .engine format.
actual behavior
Received an error:
[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024080600
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:01<00:00, 11.09it/s]
[08/08/2024-10:27:28] [TRT] [I] Exporting onnx to tmp/trt_engines/llava-v1.6-34b-hf/vision_encoder/onnx/model.onnx
Traceback (most recent call last):
File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 817, in
builder.build()
File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 84, in build
build_llava_engine(args)
File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 374, in build_llava_engine
export_onnx(wrapper, image, f'{args.output_dir}/onnx')
File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 118, in export_onnx
torch.onnx.export(model,
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 516, in export
_export(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1612, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1138, in _model_to_graph
graph = _optimize_graph(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 677, in _optimize_graph
graph = _C._jit_pass_onnx(graph, operator_export_type)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1956, in _run_symbolic_function
return symbolic_fn(graph_context, *inputs, **attrs)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_helper.py", line 306, in wrapper
return fn(g, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_opset14.py", line 176, in scaled_dot_product_attention
query_scaled = g.op("Mul", query, g.op("Sqrt", scale))
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 87, in op
return _add_op(self, opname, *raw_args, outputs=outputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 238, in _add_op
inputs = [_const_if_tensor(graph_context, arg) for arg in args]
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 238, in
inputs = [_const_if_tensor(graph_context, arg) for arg in args]
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 269, in _const_if_tensor
return _add_op(graph_context, "onnx::Constant", value_z=arg)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 246, in _add_op
node = _create_node(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 305, in _create_node
_add_attribute(node, key, value, aten=aten)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/internal/jit_utils.py", line 356, in add_attribute
return getattr(node, f"{kind}")(name, value)
TypeError: z(): incompatible function arguments. The following argument types are supported:
1. (self: torch._C.Node, arg0: str, arg1: torch.Tensor) -> torch._C.Node
Invoked with: %482 : Tensor = onnx::Constant(), scope: main.build_llava_engine..LlavaNextVisionWrapper::/transformers.models.clip.modeling_clip.CLIPVisionTransformer::vision_tower/transformers.models.clip.modeling_clip.CLIPEncoder::encoder/transformers.models.clip.modeling_clip.CLIPEncoderLayer::layers.0/transformers.models.clip.modeling_clip.CLIPSdpaAttention::self_attn
, 'value', 0.125
(Occurred when translating scaled_dot_product_attention).
additional notes
The environment was set up exactly as described in the documentation, using a Docker container with Ubuntu (https://nvidia.github.io/TensorRT-LLM/installation/linux.html).
The build of the LLM part, completed successfully without any issues.
I suspect the problem may be due to a mismatch of the libraries installed by the TensorRT-LLM installation procedure.
I have no idea how to resolve this issue.
Is there anyone who could provide some guidance on where to start?
Thank you.