Zero Zeng

Results 571 comments of Zero Zeng

T4 is pretty old GPU, maybe we just don't have much optimized kernel for it?

It might be caused by LayerNorm overflow in FP16 and you should see a TRT warning when build the engine, you can try fallback the layer norm to FP32.

> any reason why these layers would overflow in trt but not in onnx? FP16 has a smaller range than FP32, it's cause by internal implementation, onnxruntime doesn't have much...

Could you please try TRT 8.6 GA? the result looks not very bad for me. ``` [I] Absolute Difference | Stats: mean=0.0068461, std-dev=0.017569, var=0.00030868, median=0.0038256, min=2.5518e-09 at (5, 67, 193),...

``` [I] Error Metrics: encoder_last_hidden_state [I] Minimum Required Tolerance: elemwise error | [abs=0.40578] OR [rel=1.8275e+15] (requirements may be lower if both abs/rel tolerances are set) [I] Absolute Difference | Stats:...

Yes, or you can just use our tensorrt docker image.

Could you please try TRT 9.2? Looks like a fixed issue Link: https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.linux.x86_64-gnu.cuda-11.8.tar.gz https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.linux.x86_64-gnu.cuda-12.2.tar.gz https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.ubuntu-22.04.aarch64-gnu.cuda-12.2.tar.gz ``` [I] Finished engine building in 42.643 seconds [I] trt-runner-N0-01/27/24-07:47:50 ---- Inference Input(s) ---- {img...

> "[E] 4: [network.cpp::validate::3640] Error Code 4: Internal Error (image_embeds: for dimension number 2 in profile 0 does not match network definition (got min=768, opt=768, max=768), expected min=opt=max=1024).)" It means...