Zero Zeng

Results 582 comments of Zero Zeng

@ttyio Do you have any recommendations on the QDQ placement here? I think the user can fine-tune it to get better performance.

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_convolution_layer.html We don't support padding output for an IConvolutionLayer or IDeconvolution layer. prePadding and postPadding is used for asymmetric padding value. e.g. In a 2d conv, your want to pad...

Sorry, I miss the limitation, it should be feasible to do it with ISliceLayer(https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_slice_layer.html). Try adding a slice layer with a negative size.

I am not an expert on this, perhaps you can refer to https://github.com/onnx/onnx-tensorrt/blob/c3cfcbc8248c6bd007e6630af2085df5e4834b42/builtin_op_importers.cpp#L2972. that's how we parse an onnx Pad node to TensorRT. @pranavm-nvidia @kevinch-nv might know more about this.

https://github.com/NVIDIA/TensorRT/issues/2103#issuecomment-1170081821 Your problem seems similar to https://github.com/NVIDIA/TensorRT/issues/2103. Does it apply to your case?

Just did a quick test with polygraphy: ``` [I] onnxrt-runner-N0-08/11/22-13:12:28 ---- Inference Input(s) ---- {x2paddle_images [dtype=float32, shape=(1, 3, 640, 640)]} [I] onnxrt-runner-N0-08/11/22-13:12:28 ---- Inference Output(s) ---- {save_infer_model/scale_0.tmp_0 [dtype=float32, shape=(1, 25200,...

I'm not an expert in the windows platform, but AFAIK windows have more overhead in context switch or something. @nvpohanh may have more insight here.

I think TRT should have good out-of-box performance for T5 now. can you try to export it to onnx and check the throughput?