Zero Zeng comments

Results 582 comments of


                                            Zero Zeng

PTQ is faster than QAT

@ttyio Do you have any recommendations on the QDQ placement here? I think the user can fine-tune it to get better performance.

PRelu is not integrated with QDQ

@ttyio ^ ^

nn.ConvTranspose3d with output_padding

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_convolution_layer.html We don't support padding output for an IConvolutionLayer or IDeconvolution layer. prePadding and postPadding is used for asymmetric padding value. e.g. In a 2d conv, your want to pad...

nn.ConvTranspose3d with output_padding

Sorry, I miss the limitation, it should be feasible to do it with ISliceLayer(https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_slice_layer.html). Try adding a slice layer with a negative size.

nn.ConvTranspose3d with output_padding

I am not an expert on this, perhaps you can refer to https://github.com/onnx/onnx-tensorrt/blob/c3cfcbc8248c6bd007e6630af2085df5e4834b42/builtin_op_importers.cpp#L2972. that's how we parse an onnx Pad node to TensorRT. @pranavm-nvidia @kevinch-nv might know more about this.

ONNX inference results are different from Pytorch inference results

https://github.com/NVIDIA/TensorRT/issues/2103#issuecomment-1170081821 Your problem seems similar to https://github.com/NVIDIA/TensorRT/issues/2103. Does it apply to your case?

mAP drops a lot when Infer a INT8 quantized ONNX model.

Just did a quick test with polygraphy: ``` [I] onnxrt-runner-N0-08/11/22-13:12:28 ---- Inference Input(s) ---- {x2paddle_images [dtype=float32, shape=(1, 3, 640, 640)]} [I] onnxrt-runner-N0-08/11/22-13:12:28 ---- Inference Output(s) ---- {save_infer_model/scale_0.tmp_0 [dtype=float32, shape=(1, 25200,...

Zero Zeng

PTQ is faster than QAT

PRelu is not integrated with QDQ

nn.ConvTranspose3d with output_padding

nn.ConvTranspose3d with output_padding

nn.ConvTranspose3d with output_padding

ONNX inference results are different from Pytorch inference results

mAP drops a lot when Infer a INT8 quantized ONNX model.

acceleration time

How should I speed up T5 original exported saved_model by using TRT?

How should I speed up T5 original exported saved_model by using TRT?