Zero Zeng comments

Results 571 comments of


                                            Zero Zeng

GPU Latency failure for FP16, INT8, mixed precision (FP16+INT8) models of TensorRT 8.6 when running trtexec on GPU A100

@ttyio for above questions.

how to add_ElementWise with two dynamic shape tensor

Do you mean how to do it with TensorRT API? You can check our developer guide and api doc.

Are torch.nn.functional methods automatically quantized by pytorch-quantization?

Please check our sample(https://github.com/NVIDIA/TensorRT/tree/release/8.6/tools/pytorch-quantization/examples) and documentation.

Are torch.nn.functional methods automatically quantized by pytorch-quantization?

Like https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html#document-tutorials/creating_custom_quantized_modules

Custom Attention implementation not well optimised by TensorRT

@nvpohanh @zhenhuaw-me ^ ^

poor performance of batched matmul for larger batch sizes

@nvpohanh any comments? ^ ^

poor performance of batched matmul for larger batch sizes

What if you add an extra batch dimension. so the inputs be like 1xold_batchxlenx...?

TensorRT 8.5 deprecated functions

use `delete runtime` or use smart pointer.

TensorRT 8.5 deprecated functions

> Deprecated interface will be removed in TensorRT 10.0. it means if you compile the code with TRT 10.0, you will get compile error.

Performance Discrepancy between Quantized ONNX Model and FP16 Model

Usually, it's caused by sub-optimal Q/QD placement, could you please refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work-with-qat-networks? Also you can compare the verbose log and check the layer-wise precision/performance to find out the reason....