TensorRT issues

trt-engine-explorer plot circle in graph of an model engine

6

This engine was built in fp16 mode. And tensorrt infer results is right. Just wonder how tensorrt know when to stop the infer as ther is a circle in below...

roachsinai

triaged

Can I implement block quantization through tensorflow-quantization?

4

I want to implement block quantization through tensorflow-quantization, what process should I need to follow? Or can you support a simple case?

lqq-feel

triaged

Module:Quantization

Investigating

TensorRT 8.6.2 MatrixMultiply Operator Quantization

4

I am performing QAT quantization on the HRNet OCR model and using TensorRT 8.6.2 to convert and quantize the generated ONNX model with QDQ operations. After conversion, I found that...

dingliangxiansheng

triaged

Module:Quantization

Inserting QDQ has severely impacted the performance of the unquantized Myelin part.

3

## Description I am performing QAT quantization on a complex model. When I insert Q/DQ nodes into the ResNet portion I want to quantize according to the rules, TensorRT can...

zsh4614

triaged

Module:Quantization

[stable diffusion] [unet] [compiling] faild to find implemtation ForeignNode[/up_blocks.0/resnets.0/time_mixer/Constant_1_output_0.../conv_act/Mul

7

focusunsink

triaged

Module:Demo

Comparison of infer speed between different TRT8.5.3.1 VS TRT 10.5.0.18 on GPU 3060-12G/4060Ti-16G

5

## Description **I tried to perform inference time statistics for the segmentation model on my machine(bisenetv2) between TRT-8.5.3.1 VS TRT-10.5.0.18. But I found a big difference in inference speed between...

gaoyu-cao

Module:Performance

Got Segmentation fault (core dumped) of TensorRT 10.3 when running execute_async_v3 on GPU H20

5

## Description I used the following commands to convert an ONNX model to a TRT engine, where the input.onnx file is the original model: ``` polygraphy surgeon sanitize --fold-constants ./input.onnx...

simonzgx

triaged

TensorRT Build Error: Tensor Volume Exceeds 2^31 Limit for Large Fixed Shapes (Super-Resolution/Restoration Models)

2

Environment: • TensorRT Version: 10.9.0.34 • GPU Type: NVIDIA GeForce RTX 3090 (24GB VRAM) • Nvidia Driver Version: 572.83 • CUDA Version: 12.8.1 • CUDNN Version: 9.8.0 • Operating System:...

zelenooki87

Module:Engine Build

triaged

softmax + argmax takes too much time

3

![Image](https://github.com/user-attachments/assets/f10bc335-a18c-4f79-a6c8-c00752dda788) ![Image](https://github.com/user-attachments/assets/a2c8e57e-6748-4e0c-82c8-e06195203a5b) I would like to ask, considering using TensorRT plugin or CUDA Kernel to implement more efficient Argmax operations, will it be faster?

Xiao-Hu-Z

triaged

How does TensorRT leverage attention masks to speed up inference ?

4

Hello team, Thanks for all the great work, I am training a model where I am providing tile-wise constant attention masks (see picture below). At inference time, how will TensorRT...

MatthieuToulemont

triaged

TensorRT
TensorRT copied to clipboard

Metadata

trt-engine-explorer plot circle in graph of an model engine

Can I implement block quantization through tensorflow-quantization?

TensorRT 8.6.2 MatrixMultiply Operator Quantization

Inserting QDQ has severely impacted the performance of the unquantized Myelin part.

[stable diffusion] [unet] [compiling] faild to find implemtation ForeignNode[/up_blocks.0/resnets.0/time_mixer/Constant_1_output_0.../conv_act/Mul

Comparison of infer speed between different TRT8.5.3.1 VS TRT 10.5.0.18 on GPU 3060-12G/4060Ti-16G

Got Segmentation fault (core dumped) of TensorRT 10.3 when running execute_async_v3 on GPU H20

TensorRT Build Error: Tensor Volume Exceeds 2^31 Limit for Large Fixed Shapes (Super-Resolution/Restoration Models)

softmax + argmax takes too much time

How does TensorRT leverage attention masks to speed up inference ?

← Metadata

Owner

Metadata

TensorRT TensorRT copied to clipboard

Metadata

← Metadata

Owner

Metadata

TensorRT
TensorRT copied to clipboard