TensorRT issues

Do tensorrt 9.2 support flash attention v2

7

pytorch is now support flash attention v2, which is 2 times faster than flash attention: https://pytorch.org/blog/pytorch2-2/ So I'm wondering if tensorrt 9.2 already support flash attention v2, or I have...

linkedqueue

triaged

Customized TensorRT operator Col2Im, but parsing failed of TensorRT9.2

8

## Description I customized TensorRT's Col2Im plugin, recompiled the source code of TensorRT8.5, and generated a new nvinfer_plugin library. ## Environment **TensorRT Version**: 9.2.0.5 **NVIDIA GPU**: GeForce GTX 1080 Ti...

demuxin

ONNX

triaged

Failure due to batch size of TensorRT 8.6.1 when running inference on NVIDIA RTX A2000 8GB Laptop GPU

1

## Description I am trying to convert sligtly modified version of [YOSO](https://github.com/hujiecpp/YOSO) from pytorch to TRT. I cannot make it work with batch size 8. Can you please point me...

nullkatar

triaged

use tensorrt inference bert, speed slow than onnxruntime

1

Use tensorrt inference bert, speed slow than onnxruntime，tensorrt is 10ms，onnx is 6ms，model just simple bert classification model. Could some one help me? onnx code ``` import numpy as np import...

yan123456jie

triaged

Segmentation Fault failure of TensorRT 8.6.1 when converting onnx on GPU GeForce RTX 3050 Ti

2

## Description I tried to convert my onnx model to .trt but trtexec segfaulted. See attached log output of trtexec ... the program segfaults after the final line you see...

steve-volley

triaged

Question: Will TensorRT 9 be available in the 23.11 NGC container ?

7

Hello, thanks for all the great work ! Some of my models require bfloat16 at inference time, I saw it was added in TensorRT 9 with TensorRT-LLM, and I was...

MatthieuToulemont

triaged

Unfused Multihead attention TensorRT 9.2 is 2x slower than PyTorch 2.2 on GPU A100-SXM4-40GB

3

## Description When I'm comparing Multihead Attention between Torch2.2 and TensorRT 9.2 on A100-SXM4-40G, I found that for certain size the result engine does not use `_gemm_mha_v2` tactics. When not...

haijieg

triaged

internal-bug-tracked

Implementation bug in Disentangled Attention Plugin

4

## Description For the tmp values are precomputed for re-use, tmp is calculated as below: https://github.com/NVIDIA/TensorRT/blob/78245b0ac2af9a208ed02e5257bfd3ee7ae8a88d/plugin/disentangledAttentionPlugin/disentangledKernel.cu#L122 The sequence length `dimResult.y` is wrongly used as max relative position. But according to...

fillmore

triaged

internal-bug-tracked

Assertion engine failed

4

## Description Use trtexec convert an onnx model to trt failed, but no more error information, how to solve it? ```bash [02/20/2024-10:56:21] [E] Error[2]: Assertion engine failed. [02/20/2024-10:56:21] [E] Error[2]:...

PWZER

triaged

How to use tensorrt with torch tensor on cuda

1

The code below shows that the numpy part works perfectly, but using torch's gpu tensor will report an error. My actual usage scenario is to decode video using vpf first,...

chenj133

triaged

TensorRT
TensorRT copied to clipboard

Metadata

Do tensorrt 9.2 support flash attention v2

Customized TensorRT operator Col2Im, but parsing failed of TensorRT9.2

Failure due to batch size of TensorRT 8.6.1 when running inference on NVIDIA RTX A2000 8GB Laptop GPU

use tensorrt inference bert, speed slow than onnxruntime

Segmentation Fault failure of TensorRT 8.6.1 when converting onnx on GPU GeForce RTX 3050 Ti

Question: Will TensorRT 9 be available in the 23.11 NGC container ?

Unfused Multihead attention TensorRT 9.2 is 2x slower than PyTorch 2.2 on GPU A100-SXM4-40GB

Implementation bug in Disentangled Attention Plugin

Assertion engine failed

How to use tensorrt with torch tensor on cuda

← Metadata

Owner

Metadata

TensorRT TensorRT copied to clipboard

Metadata

← Metadata

Owner

Metadata

TensorRT
TensorRT copied to clipboard