espnet_onnx
espnet_onnx copied to clipboard
Quantize model is slower than raw model
I test espnet_onnx with a conformer model, I eval 100 wav 10 times and calculate the RTF only forward time, the result is
cpu | gpu | |
---|---|---|
fp32 | 0.0180668 | 0.00263397 |
quantize | 0.0172804 | 0.0124609 |
quantize model is very slower than fp32 model on GPU and just a litter bit faster on cpu
System information: torch /cuda / GPU: 11.0 / 11.6 / A100 cpu: AMD EPYC 7402 24-Core Processor onnx: 1.10.1 onnxruntime-gpu : 1.13.1 espnet_onnx: 0.1.9
Have you tested the speed of the quantize model on GPU
Hi @jinggaizi, GPU inference of quantized model is not supported on onnxruntime, that's why it is slow.
thanks ,I will learn tensorRT to support this mode
I test espnet_onnx with a conformer model, I eval 100 wav 10 times and calculate the RTF only forward time, the result is
cpu gpu fp32 0.0180668 0.00263397 quantize 0.0172804 0.0124609 quantize model is very slower than fp32 model on GPU and just a litter bit faster on cpu
System information: torch /cuda / GPU: 11.0 / 11.6 / A100 cpu: AMD EPYC 7402 24-Core Processor onnx: 1.10.1 onnxruntime-gpu : 1.13.1 espnet_onnx: 0.1.9
Have you tested the speed of the quantize model on GPU
hi, do you encounter the problem https://github.com/espnet/espnet_onnx/issues/70