BladeDISC [Quantization] refactor ptq of trt backend

[Quantization] refactor ptq of trt backend

Open chenbohua3 opened this issue 2 years ago • 0 comments

The overall process can be divided into the following steps:

[x] make each subgraph executable, this allows us to collect the inputs of each subgraph when inference with the Calibration data.
[x] modify the graph and add data-collector node.
[x] save the inputs and use trt calibrator to build the quantization engine.
[x] drop the q_val in codebase since it is no longer needed. (QAT model comes from the blade_compression and PTQ is done by trt itself)
[ ] remove the added node in c_module during the data collecting process.

Aug 02 '22 00:08 chenbohua3