Zero Zeng

Results 571 comments of Zero Zeng

Do you mean how to do it with TensorRT API? You can check our developer guide and api doc.

Please check our sample(https://github.com/NVIDIA/TensorRT/tree/release/8.6/tools/pytorch-quantization/examples) and documentation.

Like https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html#document-tutorials/creating_custom_quantized_modules

What if you add an extra batch dimension. so the inputs be like 1xold_batchxlenx...?

use `delete runtime` or use smart pointer.

> Deprecated interface will be removed in TensorRT 10.0. it means if you compile the code with TRT 10.0, you will get compile error.

Usually, it's caused by sub-optimal Q/QD placement, could you please refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work-with-qat-networks? Also you can compare the verbose log and check the layer-wise precision/performance to find out the reason....