Zero Zeng
Zero Zeng
@ttyio for above questions.
Do you mean how to do it with TensorRT API? You can check our developer guide and api doc.
Please check our sample(https://github.com/NVIDIA/TensorRT/tree/release/8.6/tools/pytorch-quantization/examples) and documentation.
Like https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html#document-tutorials/creating_custom_quantized_modules
@nvpohanh @zhenhuaw-me ^ ^
@nvpohanh any comments? ^ ^
What if you add an extra batch dimension. so the inputs be like 1xold_batchxlenx...?
use `delete runtime` or use smart pointer.
> Deprecated interface will be removed in TensorRT 10.0. it means if you compile the code with TRT 10.0, you will get compile error.
Usually, it's caused by sub-optimal Q/QD placement, could you please refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work-with-qat-networks? Also you can compare the verbose log and check the layer-wise precision/performance to find out the reason....