TensorRT
TensorRT copied to clipboard
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
pytorch is now support flash attention v2, which is 2 times faster than flash attention: https://pytorch.org/blog/pytorch2-2/ So I'm wondering if tensorrt 9.2 already support flash attention v2, or I have...
## Description I customized TensorRT's Col2Im plugin, recompiled the source code of TensorRT8.5, and generated a new nvinfer_plugin library. ## Environment **TensorRT Version**: 9.2.0.5 **NVIDIA GPU**: GeForce GTX 1080 Ti...
## Description I am trying to convert sligtly modified version of [YOSO](https://github.com/hujiecpp/YOSO) from pytorch to TRT. I cannot make it work with batch size 8. Can you please point me...
Use tensorrt inference bert, speed slow than onnxruntime,tensorrt is 10ms,onnx is 6ms,model just simple bert classification model. Could some one help me? onnx code ``` import numpy as np import...
## Description I tried to convert my onnx model to .trt but trtexec segfaulted. See attached log output of trtexec ... the program segfaults after the final line you see...
Hello, thanks for all the great work ! Some of my models require bfloat16 at inference time, I saw it was added in TensorRT 9 with TensorRT-LLM, and I was...
## Description When I'm comparing Multihead Attention between Torch2.2 and TensorRT 9.2 on A100-SXM4-40G, I found that for certain size the result engine does not use `_gemm_mha_v2` tactics. When not...
## Description For the tmp values are precomputed for re-use, tmp is calculated as below: https://github.com/NVIDIA/TensorRT/blob/78245b0ac2af9a208ed02e5257bfd3ee7ae8a88d/plugin/disentangledAttentionPlugin/disentangledKernel.cu#L122 The sequence length `dimResult.y` is wrongly used as max relative position. But according to...
## Description Use trtexec convert an onnx model to trt failed, but no more error information, how to solve it? ```bash [02/20/2024-10:56:21] [E] Error[2]: Assertion engine failed. [02/20/2024-10:56:21] [E] Error[2]:...
The code below shows that the numpy part works perfectly, but using torch's gpu tensor will report an error. My actual usage scenario is to decode video using vpf first,...