BladeDISC
BladeDISC copied to clipboard
[Quantization] refactor ptq of trt backend
The overall process can be divided into the following steps:
- [x] make each subgraph executable, this allows us to collect the inputs of each subgraph when inference with the Calibration data.
- [x] modify the graph and add data-collector node.
- [x] save the inputs and use trt calibrator to build the quantization engine.
- [x] drop the q_val in codebase since it is no longer needed. (QAT model comes from the blade_compression and PTQ is done by trt itself)
- [ ] remove the added node in c_module during the data collecting process.