Zero Zeng

Results 581 comments of Zero Zeng

That‘s because when dynamic shape is enabled, when you specify a new binding shape for a context, at the first inference TRT will have to do a shape inference to...

Just use 1x512 as opt shape when building the engine.

when dynamic shapes is enabled, TRT will select kernel tactics that have the best performance and are suitable for all input shapes between the min shape and the max shape....

your "shorter text shape" != opt shape right? trt only make sure the kernel is able to run with the "shorter text shape" but don't guarantee its performance. only optimize...

Please refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work_dynamic_shapes > Is the input shape will be truncated due to the context shape being smaller? Yes, it always uses the binding shape as the input, e.g....

@Vinayaks117 I don't have much time now, can you try the latest TRT on your side?

Should be a same issue as https://github.com/NVIDIA/TensorRT/issues/2338. can be fixed with preview feature in TRT 8.5.1 ``` &&&& PASSED TensorRT.trtexec [TensorRT v8501] # trtexec --onnx=model.onnx preview=+fasterDynamicShapes0805 --saveEngine=model_bs16.plan --minShapes=input_ids:1x128,attention_mask:1x128,token_type_ids:1x128 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128...

I can reproduce this with TRT 8.5.0.9. but I can not confirm this is an accuracy bug since I see some pow layers that may amplify the diff. @pranavm-nvidia @ttyio...