Zero Zeng comments

Results 571 comments of


                                            Zero Zeng

Tensorrt fp32 inference slower than pytorch on tesla T4 for GroundingDINO

T4 is pretty old GPU, maybe we just don't have much optimized kernel for it?

Conversion from onnx hurts accuracy when model is using fp16

It might be caused by LayerNorm overflow in FP16 and you should see a TRT warning when build the engine, you can try fallback the layer norm to FP32.

Conversion from onnx hurts accuracy when model is using fp16

> any reason why these layers would overflow in trt but not in onnx? FP16 has a smaller range than FP32, it's cause by internal implementation, onnxruntime doesn't have much...

Conversion from onnx hurts accuracy when model is using fp16

Could you please try TRT 8.6 GA? the result looks not very bad for me. ``` [I] Absolute Difference | Stats: mean=0.0068461, std-dev=0.017569, var=0.00030868, median=0.0038256, min=2.5518e-09 at (5, 67, 193),...

Conversion from onnx hurts accuracy when model is using fp16

``` [I] Error Metrics: encoder_last_hidden_state [I] Minimum Required Tolerance: elemwise error | [abs=0.40578] OR [rel=1.8275e+15] (requirements may be lower if both abs/rel tolerances are set) [I] Absolute Difference | Stats:...

Conversion from onnx hurts accuracy when model is using fp16

Yes, or you can just use our tensorrt docker image.

Segmentation fault of TensorRT 8.6 when running `trtexec --onnx=<file>` on GPU V100

Could you please try TRT 9.2? Looks like a fixed issue Link: https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.linux.x86_64-gnu.cuda-11.8.tar.gz https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.linux.x86_64-gnu.cuda-12.2.tar.gz https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.ubuntu-22.04.aarch64-gnu.cuda-12.2.tar.gz ``` [I] Finished engine building in 42.643 seconds [I] trt-runner-N0-01/27/24-07:47:50 ---- Inference Input(s) ---- {img...

Zero Zeng

Tensorrt fp32 inference slower than pytorch on tesla T4 for GroundingDINO

Conversion from onnx hurts accuracy when model is using fp16

Conversion from onnx hurts accuracy when model is using fp16

Conversion from onnx hurts accuracy when model is using fp16

Conversion from onnx hurts accuracy when model is using fp16

Conversion from onnx hurts accuracy when model is using fp16

Segmentation fault of TensorRT 8.6 when running `trtexec --onnx=<file>` on GPU V100

Segmentation fault of TensorRT 8.6 when running `trtexec --onnx=<file>` on GPU V100

Using tensorflow quantisation toolkit, how to export to tflite correctly?

Error Code 4: Internal Error of TensorRT 9.1.0 when running blip-large on GPU Tesla T4