TensorRT
TensorRT copied to clipboard
Assertion bound >= 0 failed of TensorRT 8.6.1 when running build_serialized_network on GPU nvidia tesla v100
Description
I try to convert small modification of VITS model https://github.com/jaywalnut310/vits. But getting error when running builder.build_serialized_network:
[01/29/2024-13:52:34] [TRT] [I] Graph optimization time: 0.629615 seconds.
[01/29/2024-13:52:34] [TRT] [W] BuilderFlag::kENABLE_TACTIC_HEURISTIC has been ignored in this builder run. This feature is only supported on Ampere and beyond.
[01/29/2024-13:52:34] [TRT] [V] Building graph using backend strategy 0
[01/29/2024-13:52:34] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed.
[01/29/2024-13:52:34] [TRT] [V] Constructing optimization profile number 0 [1/1].
[01/29/2024-13:52:34] [TRT] [E] 2: Assertion bound >= 0 failed.
[01/29/2024-13:52:34] [TRT] [E] 2: [shapeContext.cpp::checkVolume::2923] Error Code 2: Internal Error (Assertion bound >= 0 failed. )
Environment
TensorRT Version: 8.6.1
NVIDIA GPU: Nvidia Tesla v100
NVIDIA Driver Version: 450.216.04
CUDA Version: 11.6
CUDNN Version: 8.9
Operating System: Ubuntu 22.04.3 inside Docker Container
Python Version (if applicable): 3.11
PyTorch Version (if applicable): 1.13.1
Steps To Reproduce
Have you tried the latest release?: yes
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): yes
What does that error mean? How can I debug this?
Does it work with onnxruntime? you can check it quickly with polygraphy run model.onnx --onnxrt, if yes then could you please provide a reproduce? Thanks!
Of course, It works with onnxruntime and polygraphy. Polygraphy output
[I] RUNNING | Command: /home/user/conda/envs/ekerimov-convert/bin/polygraphy run onnx_500k/generator.onnx --onnxrt
[I] onnxrt-runner-N0-01/29/24-15:45:19 | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[W] Input tensor: text_emb [shape=BoundedShape(['batch_axis', 'text_axis', 192], min=None, max=None)] | Will generate data of shape: [1, 1, 192].
If this is incorrect, please provide a custom data loader.
[W] Input tensor: q_labels [shape=BoundedShape(['batch_axis', 'text_axis', 5], min=None, max=None)] | Will generate data of shape: [1, 1, 5].
If this is incorrect, please provide a custom data loader.
[W] Input tensor: bert_emb [shape=BoundedShape(['batch_axis', 'token_axis', 768], min=None, max=None)] | Will generate data of shape: [1, 1, 768].
If this is incorrect, please provide a custom data loader.
[W] Input tensor: speaker_ids [shape=BoundedShape(['batch_axis'], min=None, max=None)] | Will generate data of shape: [1].
If this is incorrect, please provide a custom data loader.
[W] Input tensor: length_scale [shape=BoundedShape(['batch_axis', 'text_axis'], min=None, max=None)] | Will generate data of shape: [1, 1].
If this is incorrect, please provide a custom data loader.
[W] Input tensor: noise_scale [shape=BoundedShape(['batch_axis'], min=None, max=None)] | Will generate data of shape: [1].
If this is incorrect, please provide a custom data loader.
[W] Input tensor: noise_scale_w [shape=BoundedShape(['batch_axis'], min=None, max=None)] | Will generate data of shape: [1].
If this is incorrect, please provide a custom data loader.
[I] onnxrt-runner-N0-01/29/24-15:45:19
---- Inference Input(s) ----
{text_emb [dtype=float32, shape=(1, 1, 192)],
q_labels [dtype=int64, shape=(1, 1, 5)],
bert_emb [dtype=float32, shape=(1, 1, 768)],
speaker_ids [dtype=int64, shape=(1,)],
length_scale [dtype=float32, shape=(1, 1)],
noise_scale [dtype=float32, shape=(1,)],
noise_scale_w [dtype=float32, shape=(1,)]}
[I] onnxrt-runner-N0-01/29/24-15:45:19
---- Inference Output(s) ----
{wav [dtype=float32, shape=(1, 1, 1024)],
attn [dtype=float32, shape=(1, 4, 1)]}
[I] onnxrt-runner-N0-01/29/24-15:45:19 | Completed 1 iteration(s) in 67.58 ms | Average inference time: 67.58 ms.
[I] PASSED | Runtime: 3.193s | Command: /home/user/conda/envs/ekerimov-convert/bin/polygraphy run onnx_500k/generator.onnx --onnxrt
could you please provide a reproduce? Thanks!
Would be great if you can try TRT 9.2/9.3 first.
Is there python wheel with trt 9.2/9.3 or I need trtexec?
python wheel should be shipped with the tar package.
I couldn't find wheel in tar package of current repo. But I found in such archives https://developer.nvidia.com/nvidia-tensorrt-8x-download , but there is also version 8.6.1
I uploaded onnx model to reproduce https://drive.google.com/file/d/1nlXTliLV9M7_Z1xiQnUXYP_p8UqbEUBk/view?usp=sharing
And use such code
# %%
import tensorrt as trt
import onnx
logger = trt.Logger(trt.Logger.VERBOSE)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)
# %%
success = parser.parse_from_file('generator.onnx')
for idx in range(parser.num_errors):
err = parser.get_error(idx)
print(err)
if not success:
exit(0)
# %%
config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1024 * 1024 * 1024)
config.flags |= 1 << int(trt.BuilderFlag.DEBUG)
config.clear_flag(trt.BuilderFlag.TF32)
MIN_TIME_AXIS = 1
MAX_TIME_AXIS = 400
MIN_TIME_AXIS_BERT = 1
MAX_TIME_AXIS_BERT = 50
# test input
TEST_TIME_AXIS = 400
TEST_TIME_AXIS_BERT = 50
TEXT_EMB_SIZE = 192
N_Q_FEATURES = 5
BERT_EMB_DIM = 768
dynamic_shape_config = [
{"input": "text_emb", "min": (1, MIN_TIME_AXIS, TEXT_EMB_SIZE), "opt": (1, MAX_TIME_AXIS, TEXT_EMB_SIZE), "max": (1, MAX_TIME_AXIS, TEXT_EMB_SIZE)},
{"input": "q_labels", "min": (1, MIN_TIME_AXIS, N_Q_FEATURES), "opt": (1, MAX_TIME_AXIS, N_Q_FEATURES), "max": (1, MAX_TIME_AXIS, N_Q_FEATURES)},
{"input": "bert_emb", "min": (1, MIN_TIME_AXIS_BERT, BERT_EMB_DIM), "opt": (1, MAX_TIME_AXIS_BERT, BERT_EMB_DIM), "max": (1, MAX_TIME_AXIS_BERT, BERT_EMB_DIM)},
{"input": 'speaker_ids', "min": (1,), "opt": (1,), "max": (1,)},
{"input": 'noise_scale', "min": (1,), "opt": (1,), "max": (1,)},
{"input": 'noise_scale_w', "min": (1,), "opt": (1,), "max": (1,)},
{"input": 'length_scale', "min": (1, MIN_TIME_AXIS,), "opt": (1, MAX_TIME_AXIS,), "max": (1, MAX_TIME_AXIS,)},
]
profile = builder.create_optimization_profile()
for s in dynamic_shape_config:
profile.set_shape(**s)
config.add_optimization_profile(profile)
# config.builder_optimization_level = 0
ser_engine = builder.build_serialized_network(network, config)
with open('generator.trt', 'wb') as f:
f.write(ser_engine)
I found that the error is due to this line https://github.com/jaywalnut310/vits/blob/main/models.py#L517.
or rather because of attn.squeeze(). But due squeeze doesn't work https://github.com/NVIDIA/TensorRT/issues/2846 I used just attn = attn[:, 0] and then matmul.
And trt raises error due attn[:, 0]. If I comment this line and all calls after, convertation works ok.
Shape of attn is (batch_size, 1, t_1, t_2)
I also tried trtexec of version 8.6, 7.x and the same error occurs
Test with TRT 9.2:
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_14: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_16: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_20: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_22: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_24: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_26: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_14: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_16: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_20: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_22: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_24: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_26: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_14: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_16: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_20: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_22: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_24: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_26: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [E] Error[4]: [fillNode.cpp::symbolicExecute::109] Error Code 4: Internal Error (/dp/RandomNormalLike: an IFillLayer can compute a shape tensor only for FillOperation::kLINSPACE.)
[02/19/2024-08:07:17] [E] Engine could not be created from network
[02/19/2024-08:07:17] [E] Building engine failed
[02/19/2024-08:07:17] [E] Failed to create engine from model or file.
[02/19/2024-08:07:17] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v9200] # trtexec --onnx=generator.onnx
Looks like we hit a known limitation, what is the real input shape?
I run with such command
/usr/src/tensorrt/bin/trtexec --onnx=generator.onnx --minShapes=text_emb:1x1x192,q_labels:1x1x5,bert_emb:1x1x768,speaker_ids:1,noise_scale:1,noise_scale_w:1,length_scale:1x1 --optShapes=text_emb:1x400x192,q_labels:1x400x5,bert_emb:1x50x768,speaker_ids:1,noise_scale:1,noise_scale_w:1,length_scale:1x400 --maxShapes=text_emb:1x400x192,q_labels:1x400x5,bert_emb:1x50x768,speaker_ids:1,noise_scale:1,noise_scale_w:1,length_scale:1x400 --workspace=30000
I saw somewhere about RandomNormalLike, but as I remember solution was just update tensorrt
Any updates?
I've encountered similar issue using TRT 9.2.0.5. It's also about StochasticDurationPredictor module (https://github.com/jaywalnut310/vits/blob/main/models.py#L17), as in your output with RandomNormalLike
[02/27/2024-12:21:40] [W] [TRT] /dp/flows.3/Reshape_26: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/27/2024-12:21:41] [E] Error[4]: [fillNode.cpp::symbolicExecute::112] Error Code 4: Internal Error (/dp/flows.7/Range: An IFillLayer that computes a shape tensor can have at most one input, and the input must be the first input.)
[02/27/2024-12:21:41] [E] Engine could not be created from network
[02/27/2024-12:21:41] [E] Building engine failed
[02/27/2024-12:21:41] [E] Failed to create engine from model or file.
[02/27/2024-12:21:41] [E] Engine set up failed
Filed internal bug 4535894 for this.
Just an aside: I noticed the network is using what TensorRT calls "zero as placeholder", which indicates the original ONNX file is not setting the attribute "allowzero=1" for Reshape.
When "allowzero=1" is not present, ONNX treats a 0 in a reshape dimension not as a dimension, but as a placeholder for the corresponding input dimension. With dynamic shapes this is almost never what the author intended, and tends to break networks.
Attached is a zip file with a python script that I sometimes use to repair networks where the author did not intend 0 to be a placeholder.
It doesn't help. I got the same error
[03/05/2024-10:41:25] [TRT] [I] Graph optimization time: 0.513168 seconds.
[03/05/2024-10:41:25] [TRT] [W] BuilderFlag::kENABLE_TACTIC_HEURISTIC has been ignored in this builder run. This feature is only supported on Ampere and beyond.
[03/05/2024-10:41:25] [TRT] [V] Building graph using backend strategy 0
[03/05/2024-10:41:25] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed.
[03/05/2024-10:41:25] [TRT] [V] Constructing optimization profile number 0 [1/1].
[03/05/2024-10:41:25] [TRT] [E] 2: Assertion bound >= 0 failed.
[03/05/2024-10:41:25] [TRT] [E] 2: [shapeContext.cpp::checkVolume::2923] Error Code 2: Internal Error (Assertion bound >= 0 failed. )
May be it will help: If I split this module to two modules by this line https://github.com/jaywalnut310/vits/blob/main/models.py#L515. i.e. first module have code https://github.com/jaywalnut310/vits/blob/main/models.py#L501-L514 And second https://github.com/jaywalnut310/vits/blob/main/models.py#L515-L522 Then two modules converted without any errors. And then I can run one by one sequentially. But when two modules "inside one big module" the above error occurs.
There is an error in TensorRT that affects attempts to use IFillLayer with mode kRANDOM_UNIFORM or kRANDOM_NORMAL to construct a shape tensor. The mistake in TensorRT was that one part of the logic incorrectly claimed "I can deliver a shape tensor" and the other part later said "That's not allowed."
The FillLayers are coming from layers /RandomNormalLike and "/dp/RandomNormalLike". The first one's output has variable dimensions, which knocks it out from consideration as a shape tensor, so I think it's /dp/RandomNormalLike_output_0 that is triggering the bug.
The following hack might work. When the output from IConvolutionLayer is used as a shape tensor, TensorRT correctly deals with it, even though the layer says "I can't deliver a shape tensor". The hack is to feed the output from the IFillLayer through dummy 1x1 IConvolutionLayer that is just an identity operation, i.e. the weights are an identity matrix, TensorRT should be able to deal with it, because the convolution will stop TensorRT from asking IFillLayer to deliver a shape tensor. A complication is that IConvolutionLayer needs 4D input, so you'll need to add some reshaping to compensate.
So at the TensorRT level, the replacement for the IFillLayer looks some like:
IFillLayer --> IShuffleLayer --> IConvolutionLayer --> IShuffleLayer -->
where the first IShuffleLayer does a 3D to 4D reshape and the second IShuffleLayer does a 4D to 3D reshape. E.g., first shuffle can reshape from [1,2,1] to [1,1,2,1] and second shuffle can reshape the other direction. The convolution sees a channel-dimension of length 1, so the identity matrix is just a 1x1 matrix containing 1.
Of course what I've described is at the TensorRT level. You're probably more interested in an ONNX-level description. At the ONNX level, the hack looks like replacing RandomNormalLike /dp/RandomNormalLike with:
RandomNormalLike --> Reshape --> Conv --> Reshape -->
RandomNormalLike was from here https://github.com/jaywalnut310/vits/blob/main/models.py#L90 I replaced that line with
z = torch.randn(x.size(0), 2, x.size(2)).to(device=x.device, dtype=x.dtype) # (b, 2, t)
z = z.unsqueeze(1) # (b, 1, 2, t)
z = F.conv2d(z, z.new_ones(1, 1, 1, 1)) # identity
z = z[:, 0] # (b, 2, t)
z = z * noise_scale
And it seems to work. Will such a solution be added inside tensorrt?
I'm testing now, if another errors will occur I let you know
Hi, this issue cannot be fixed in short-term, and it's still under tracked, to unblock you, we prepare a WAR, could you please try on you side?
WAR:
- upgrade to TRT 10.0
- Add a Cast operation converting FP32 to INT64 before the /Clip operation, as shown in the following figure
closing since there is WAR, thanks all!