After quantization, the model can inference on specific GPU(T4 V100 A100) but slower than without quantization, right?

Open sisrfeng opened this issue 4 years ago • 0 comments

Quantization is currently only supported for CPUs.

Someone on zhihu says: Q: pytorch量化后的模型能使用GPU进行预测吗？ A: 可以的，里面都是伪量化操作。得到的预测结果应该还是对的，只是精度低了一点点，会更慢部署到FPGA/ASIC等特定硬件上才能看出加速效果

Aug 14 '21 10:08 sisrfeng