onnx-tensorrt
onnx-tensorrt copied to clipboard
Einsum_110: einsum input tensor 0 has invalid type Int32
Description
This issue is a followup on #818 which I also created. I am working on the transformers deploy repository and created a PR that enables support for exporting larger transformers models to TensorRT.
This works well with gpt2-medium
, gpt-neo-1.3b
, and gpt-neo-2.7b
. However, for GPT-J I am running into the following issue:
[04/29/2022-14:36:29] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[04/29/2022-14:36:29] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[04/29/2022-14:36:29] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[04/29/2022-14:36:29] [TRT] [E] [layers.cpp::validate::5677] Error Code 3: Internal Error (Einsum_110: einsum input tensor 0 has invalid type Int32)
[04/29/2022-14:36:29] [TRT] [W] building engine. depending on model size this may take a while
[04/29/2022-14:36:29] [TRT] [E] 4: [network.cpp::validate::2633] Error Code 4: Internal Error (Network must have at least one output)
[04/29/2022-14:36:29] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
Traceback (most recent call last):
File "/usr/local/bin/convert_model", line 8, in <module>
sys.exit(entrypoint())
File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 386, in entrypoint
main(commands=args)
File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 288, in main
engine: ICudaEngine = build_engine(
File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/backends/trt_utils.py", line 132, in build_engine
engine: ICudaEngine = runtime.deserialize_cuda_engine(trt_engine)
TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:
1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine
Invoked with: <tensorrt.tensorrt.Runtime object at 0x7fc959cd7c70>, None
As all other models seemingly work well, I assume this might be directly related to TRT?
Environment
TensorRT Version: 8.2.2-1 ONNX-TensorRT Version / Branch: 8.2.2.1 GPU Type:� V100 Nvidia Driver Version: 495.29.05 CUDA Version: 11.5 CUDNN Version: Operating System + Version: Ubuntu 20.04.3 LTS Python Version (if applicable): 3.8.10 TensorFlow + TF2ONNX Version (if applicable): NA PyTorch Version (if applicable): 1.10.2+cu113 Baremetal or Container (if container which image + tag): See below
Steps To Reproduce
- Clone this PR: https://github.com/ELS-RD/transformer-deploy/pull/67
- cd into main folder
-
docker build -t tfdeploy .
- Run:
docker run -it --rm --shm-size=24g --ulimit memlock=-1 --ulimit stack=67108864 --gpus device=0 \
-v $PWD:/project tfdeploy \
bash -c "cd /project && \
convert_model -m 'EleutherAI/gpt-j-6B' \
--backend tensorrt \
--seq-len 1 128 128 \
--fast \
--task text-generation"
I am tagging @kevinch-nv and @yuanyao-nv because of their excellent help last time 🚀
This is a unfortunately a known limitation in the Einsum Layer in TRT - we only support floating-point types for Einsum equations.
Do you know which operation this Einsum equation is implying? Perhaps we can try and substitute the Einsum.
@kevinch-nv: Thanks for the reply! I've been looking at the source code and afaict the only einsum
op in the model is actually a float one:
def fixed_pos_embedding(x, seq_dim=1, seq_len=None):
dim = x.shape[-1]
if seq_len is None:
seq_len = x.shape[seq_dim]
inv_freq = 1.0 / (10000 ** (torch.arange(0, dim, 2) / dim))
sinusoid_inp = (
torch.einsum("i , j -> i j", torch.arange(seq_len, dtype=torch.float), inv_freq).to(x.device).float()
)
return torch.sin(sinusoid_inp), torch.cos(sinusoid_inp)
Am I missing something obvious?
It's possible that one of the inputs are being interpreted incorrectly as INT32. Can you provide the converted .onnx model?
@oborchers As a (slightly hacky) workaround, since fixed_pos_embedding
does not depend on the input you can just precompute it for each dim
you need and a maximum sequence length, and then use something like this:
def fixed_pos_embedding(x, seq_dim=1, seq_len=None):
dim = x.shape[-1]
if seq_len is None:
seq_len = x.shape[seq_dim]
s = torch.load(f'sin_pos_{dim}.pt').to(x.device)
c = torch.load(f'cos_pos_{dim}.pt').to(x.device)
# Truncate to seq_len
s = s[:seq_len]
c = c[:seq_len]
return s, c
@oborchers Did you solve this issue? I got the same problem on the Salesforce/codegen-16b. They use the fixed_pos_embedding too. However, my colleagues could run the tensorrt engine successfully on the codege-350m.