TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Assertion bound >= 0 failed of TensorRT 8.6.1 when running build_serialized_network on GPU nvidia tesla v100

Open elch10 opened this issue 1 year ago • 17 comments

Description

I try to convert small modification of VITS model https://github.com/jaywalnut310/vits. But getting error when running builder.build_serialized_network:

[01/29/2024-13:52:34] [TRT] [I] Graph optimization time: 0.629615 seconds.
[01/29/2024-13:52:34] [TRT] [W] BuilderFlag::kENABLE_TACTIC_HEURISTIC has been ignored in this builder run. This feature is only supported on Ampere and beyond.
[01/29/2024-13:52:34] [TRT] [V] Building graph using backend strategy 0
[01/29/2024-13:52:34] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed.
[01/29/2024-13:52:34] [TRT] [V] Constructing optimization profile number 0 [1/1].
[01/29/2024-13:52:34] [TRT] [E] 2: Assertion bound >= 0 failed. 
[01/29/2024-13:52:34] [TRT] [E] 2: [shapeContext.cpp::checkVolume::2923] Error Code 2: Internal Error (Assertion bound >= 0 failed. )

Environment

TensorRT Version: 8.6.1

NVIDIA GPU: Nvidia Tesla v100

NVIDIA Driver Version: 450.216.04

CUDA Version: 11.6

CUDNN Version: 8.9

Operating System: Ubuntu 22.04.3 inside Docker Container

Python Version (if applicable): 3.11

PyTorch Version (if applicable): 1.13.1

Steps To Reproduce

Have you tried the latest release?: yes

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): yes

elch10 avatar Jan 29 '24 12:01 elch10

What does that error mean? How can I debug this?

elch10 avatar Jan 29 '24 13:01 elch10

Does it work with onnxruntime? you can check it quickly with polygraphy run model.onnx --onnxrt, if yes then could you please provide a reproduce? Thanks!

zerollzeng avatar Jan 30 '24 02:01 zerollzeng

Of course, It works with onnxruntime and polygraphy. Polygraphy output

[I] RUNNING | Command: /home/user/conda/envs/ekerimov-convert/bin/polygraphy run onnx_500k/generator.onnx --onnxrt
[I] onnxrt-runner-N0-01/29/24-15:45:19  | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[W] Input tensor: text_emb [shape=BoundedShape(['batch_axis', 'text_axis', 192], min=None, max=None)] | Will generate data of shape: [1, 1, 192].
    If this is incorrect, please provide a custom data loader.
[W] Input tensor: q_labels [shape=BoundedShape(['batch_axis', 'text_axis', 5], min=None, max=None)] | Will generate data of shape: [1, 1, 5].
    If this is incorrect, please provide a custom data loader.
[W] Input tensor: bert_emb [shape=BoundedShape(['batch_axis', 'token_axis', 768], min=None, max=None)] | Will generate data of shape: [1, 1, 768].
    If this is incorrect, please provide a custom data loader.
[W] Input tensor: speaker_ids [shape=BoundedShape(['batch_axis'], min=None, max=None)] | Will generate data of shape: [1].
    If this is incorrect, please provide a custom data loader.
[W] Input tensor: length_scale [shape=BoundedShape(['batch_axis', 'text_axis'], min=None, max=None)] | Will generate data of shape: [1, 1].
    If this is incorrect, please provide a custom data loader.
[W] Input tensor: noise_scale [shape=BoundedShape(['batch_axis'], min=None, max=None)] | Will generate data of shape: [1].
    If this is incorrect, please provide a custom data loader.
[W] Input tensor: noise_scale_w [shape=BoundedShape(['batch_axis'], min=None, max=None)] | Will generate data of shape: [1].
    If this is incorrect, please provide a custom data loader.
[I] onnxrt-runner-N0-01/29/24-15:45:19 
    ---- Inference Input(s) ----
    {text_emb [dtype=float32, shape=(1, 1, 192)],
     q_labels [dtype=int64, shape=(1, 1, 5)],
     bert_emb [dtype=float32, shape=(1, 1, 768)],
     speaker_ids [dtype=int64, shape=(1,)],
     length_scale [dtype=float32, shape=(1, 1)],
     noise_scale [dtype=float32, shape=(1,)],
     noise_scale_w [dtype=float32, shape=(1,)]}
[I] onnxrt-runner-N0-01/29/24-15:45:19 
    ---- Inference Output(s) ----
    {wav [dtype=float32, shape=(1, 1, 1024)],
     attn [dtype=float32, shape=(1, 4, 1)]}
[I] onnxrt-runner-N0-01/29/24-15:45:19  | Completed 1 iteration(s) in 67.58 ms | Average inference time: 67.58 ms.
[I] PASSED | Runtime: 3.193s | Command: /home/user/conda/envs/ekerimov-convert/bin/polygraphy run onnx_500k/generator.onnx --onnxrt

elch10 avatar Jan 30 '24 04:01 elch10

could you please provide a reproduce? Thanks!

zerollzeng avatar Feb 01 '24 14:02 zerollzeng

Would be great if you can try TRT 9.2/9.3 first.

zerollzeng avatar Feb 01 '24 14:02 zerollzeng

Is there python wheel with trt 9.2/9.3 or I need trtexec?

elch10 avatar Feb 02 '24 09:02 elch10

python wheel should be shipped with the tar package.

zerollzeng avatar Feb 07 '24 09:02 zerollzeng

I couldn't find wheel in tar package of current repo. But I found in such archives https://developer.nvidia.com/nvidia-tensorrt-8x-download , but there is also version 8.6.1

I uploaded onnx model to reproduce https://drive.google.com/file/d/1nlXTliLV9M7_Z1xiQnUXYP_p8UqbEUBk/view?usp=sharing

elch10 avatar Feb 07 '24 11:02 elch10

And use such code

# %%
import tensorrt as trt
import onnx

logger = trt.Logger(trt.Logger.VERBOSE)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)


# %%
success = parser.parse_from_file('generator.onnx')
for idx in range(parser.num_errors):
    err = parser.get_error(idx)
    print(err)

if not success:
    exit(0)

# %%
config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1024 * 1024 * 1024)
config.flags |= 1 << int(trt.BuilderFlag.DEBUG)
config.clear_flag(trt.BuilderFlag.TF32)



MIN_TIME_AXIS = 1
MAX_TIME_AXIS = 400

MIN_TIME_AXIS_BERT = 1
MAX_TIME_AXIS_BERT = 50

# test input
TEST_TIME_AXIS = 400
TEST_TIME_AXIS_BERT = 50

TEXT_EMB_SIZE = 192
N_Q_FEATURES = 5
BERT_EMB_DIM = 768


dynamic_shape_config = [
    {"input": "text_emb", "min": (1, MIN_TIME_AXIS, TEXT_EMB_SIZE), "opt": (1, MAX_TIME_AXIS, TEXT_EMB_SIZE), "max": (1, MAX_TIME_AXIS, TEXT_EMB_SIZE)},
    {"input": "q_labels", "min": (1, MIN_TIME_AXIS, N_Q_FEATURES), "opt": (1, MAX_TIME_AXIS, N_Q_FEATURES), "max": (1, MAX_TIME_AXIS, N_Q_FEATURES)},
    {"input": "bert_emb", "min": (1, MIN_TIME_AXIS_BERT, BERT_EMB_DIM), "opt": (1, MAX_TIME_AXIS_BERT, BERT_EMB_DIM), "max": (1, MAX_TIME_AXIS_BERT, BERT_EMB_DIM)},
    {"input": 'speaker_ids', "min": (1,), "opt": (1,), "max": (1,)},
    {"input": 'noise_scale', "min": (1,), "opt": (1,), "max": (1,)},
    {"input": 'noise_scale_w', "min": (1,), "opt": (1,), "max": (1,)},
    {"input": 'length_scale', "min": (1, MIN_TIME_AXIS,), "opt": (1, MAX_TIME_AXIS,), "max": (1, MAX_TIME_AXIS,)},
]

profile = builder.create_optimization_profile()
for s in dynamic_shape_config:
    profile.set_shape(**s)

config.add_optimization_profile(profile)
# config.builder_optimization_level = 0


ser_engine = builder.build_serialized_network(network, config)
with open('generator.trt', 'wb') as f:
    f.write(ser_engine)



elch10 avatar Feb 07 '24 11:02 elch10

I found that the error is due to this line https://github.com/jaywalnut310/vits/blob/main/models.py#L517. or rather because of attn.squeeze(). But due squeeze doesn't work https://github.com/NVIDIA/TensorRT/issues/2846 I used just attn = attn[:, 0] and then matmul. And trt raises error due attn[:, 0]. If I comment this line and all calls after, convertation works ok. Shape of attn is (batch_size, 1, t_1, t_2)

elch10 avatar Feb 07 '24 16:02 elch10

I also tried trtexec of version 8.6, 7.x and the same error occurs

elch10 avatar Feb 08 '24 05:02 elch10

Test with TRT 9.2:

[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_14: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_16: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_20: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_22: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_24: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.7/Reshape_26: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_14: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_16: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_20: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_22: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_24: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.5/Reshape_26: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_14: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_16: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_20: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_22: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_24: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [W] [TRT] /dp/flows.3/Reshape_26: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/19/2024-08:07:17] [E] Error[4]: [fillNode.cpp::symbolicExecute::109] Error Code 4: Internal Error (/dp/RandomNormalLike: an IFillLayer can compute a shape tensor only for FillOperation::kLINSPACE.)
[02/19/2024-08:07:17] [E] Engine could not be created from network
[02/19/2024-08:07:17] [E] Building engine failed
[02/19/2024-08:07:17] [E] Failed to create engine from model or file.
[02/19/2024-08:07:17] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v9200] # trtexec --onnx=generator.onnx

Looks like we hit a known limitation, what is the real input shape?

zerollzeng avatar Feb 19 '24 08:02 zerollzeng

I run with such command /usr/src/tensorrt/bin/trtexec --onnx=generator.onnx --minShapes=text_emb:1x1x192,q_labels:1x1x5,bert_emb:1x1x768,speaker_ids:1,noise_scale:1,noise_scale_w:1,length_scale:1x1 --optShapes=text_emb:1x400x192,q_labels:1x400x5,bert_emb:1x50x768,speaker_ids:1,noise_scale:1,noise_scale_w:1,length_scale:1x400 --maxShapes=text_emb:1x400x192,q_labels:1x400x5,bert_emb:1x50x768,speaker_ids:1,noise_scale:1,noise_scale_w:1,length_scale:1x400 --workspace=30000

elch10 avatar Feb 19 '24 08:02 elch10

I saw somewhere about RandomNormalLike, but as I remember solution was just update tensorrt

elch10 avatar Feb 19 '24 08:02 elch10

Any updates? I've encountered similar issue using TRT 9.2.0.5. It's also about StochasticDurationPredictor module (https://github.com/jaywalnut310/vits/blob/main/models.py#L17), as in your output with RandomNormalLike

[02/27/2024-12:21:40] [W] [TRT] /dp/flows.3/Reshape_26: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[02/27/2024-12:21:41] [E] Error[4]: [fillNode.cpp::symbolicExecute::112] Error Code 4: Internal Error (/dp/flows.7/Range: An IFillLayer that computes a shape tensor can have at most one input, and the input must be the first input.)
[02/27/2024-12:21:41] [E] Engine could not be created from network
[02/27/2024-12:21:41] [E] Building engine failed
[02/27/2024-12:21:41] [E] Failed to create engine from model or file.
[02/27/2024-12:21:41] [E] Engine set up failed

elch10 avatar Feb 27 '24 09:02 elch10

Filed internal bug 4535894 for this.

zerollzeng avatar Feb 28 '24 04:02 zerollzeng

Just an aside: I noticed the network is using what TensorRT calls "zero as placeholder", which indicates the original ONNX file is not setting the attribute "allowzero=1" for Reshape.

When "allowzero=1" is not present, ONNX treats a 0 in a reshape dimension not as a dimension, but as a placeholder for the corresponding input dimension. With dynamic shapes this is almost never what the author intended, and tends to break networks.

Attached is a zip file with a python script that I sometimes use to repair networks where the author did not intend 0 to be a placeholder.

allowzero.zip

ArchRobison avatar Mar 01 '24 22:03 ArchRobison

It doesn't help. I got the same error

[03/05/2024-10:41:25] [TRT] [I] Graph optimization time: 0.513168 seconds.
[03/05/2024-10:41:25] [TRT] [W] BuilderFlag::kENABLE_TACTIC_HEURISTIC has been ignored in this builder run. This feature is only supported on Ampere and beyond.
[03/05/2024-10:41:25] [TRT] [V] Building graph using backend strategy 0
[03/05/2024-10:41:25] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed.
[03/05/2024-10:41:25] [TRT] [V] Constructing optimization profile number 0 [1/1].
[03/05/2024-10:41:25] [TRT] [E] 2: Assertion bound >= 0 failed. 
[03/05/2024-10:41:25] [TRT] [E] 2: [shapeContext.cpp::checkVolume::2923] Error Code 2: Internal Error (Assertion bound >= 0 failed. )

elch10 avatar Mar 05 '24 07:03 elch10

May be it will help: If I split this module to two modules by this line https://github.com/jaywalnut310/vits/blob/main/models.py#L515. i.e. first module have code https://github.com/jaywalnut310/vits/blob/main/models.py#L501-L514 And second https://github.com/jaywalnut310/vits/blob/main/models.py#L515-L522 Then two modules converted without any errors. And then I can run one by one sequentially. But when two modules "inside one big module" the above error occurs.

elch10 avatar Mar 05 '24 08:03 elch10

There is an error in TensorRT that affects attempts to use IFillLayer with mode kRANDOM_UNIFORM or kRANDOM_NORMAL to construct a shape tensor. The mistake in TensorRT was that one part of the logic incorrectly claimed "I can deliver a shape tensor" and the other part later said "That's not allowed."

The FillLayers are coming from layers /RandomNormalLike and "/dp/RandomNormalLike". The first one's output has variable dimensions, which knocks it out from consideration as a shape tensor, so I think it's /dp/RandomNormalLike_output_0 that is triggering the bug.

The following hack might work. When the output from IConvolutionLayer is used as a shape tensor, TensorRT correctly deals with it, even though the layer says "I can't deliver a shape tensor". The hack is to feed the output from the IFillLayer through dummy 1x1 IConvolutionLayer that is just an identity operation, i.e. the weights are an identity matrix, TensorRT should be able to deal with it, because the convolution will stop TensorRT from asking IFillLayer to deliver a shape tensor. A complication is that IConvolutionLayer needs 4D input, so you'll need to add some reshaping to compensate.

So at the TensorRT level, the replacement for the IFillLayer looks some like:

IFillLayer --> IShuffleLayer --> IConvolutionLayer --> IShuffleLayer -->

where the first IShuffleLayer does a 3D to 4D reshape and the second IShuffleLayer does a 4D to 3D reshape. E.g., first shuffle can reshape from [1,2,1] to [1,1,2,1] and second shuffle can reshape the other direction. The convolution sees a channel-dimension of length 1, so the identity matrix is just a 1x1 matrix containing 1.

Of course what I've described is at the TensorRT level. You're probably more interested in an ONNX-level description. At the ONNX level, the hack looks like replacing RandomNormalLike /dp/RandomNormalLike with:

RandomNormalLike --> Reshape --> Conv --> Reshape -->

ArchRobison avatar Mar 07 '24 19:03 ArchRobison

RandomNormalLike was from here https://github.com/jaywalnut310/vits/blob/main/models.py#L90 I replaced that line with

      z = torch.randn(x.size(0), 2, x.size(2)).to(device=x.device, dtype=x.dtype) # (b, 2, t)
      z = z.unsqueeze(1) # (b, 1, 2, t)
      z = F.conv2d(z, z.new_ones(1, 1, 1, 1)) # identity
      z = z[:, 0] # (b, 2, t)

      z = z * noise_scale

And it seems to work. Will such a solution be added inside tensorrt?

I'm testing now, if another errors will occur I let you know

elch10 avatar Mar 26 '24 10:03 elch10

Hi, this issue cannot be fixed in short-term, and it's still under tracked, to unblock you, we prepare a WAR, could you please try on you side?

WAR:

  1. upgrade to TRT 10.0
  2. Add a Cast operation converting FP32 to INT64 before the /Clip operation, as shown in the following figure image

zerollzeng avatar Apr 15 '24 07:04 zerollzeng

closing since there is WAR, thanks all!

ttyio avatar Jul 02 '24 17:07 ttyio