PercepNet Quantisation

Quantisation

Open artsokol opened this issue 3 years ago • 2 comments

Hello! I found that weights are dumped in float32 format. It significantly impact the speed of inference of model on CPU. In my test 10 ms are being processed 25 ms. Would you please clarify if you tested quantisation of weights like it implemented in rnnoise model?

Oct 05 '21 12:10 artsokol

Hello, sorry for late reply quantization is not tested on Percepnet yet. I think you can apply quantization very easily according to official Pytorch Quantization Document https://pytorch.org/docs/stable/quantization.html

But in RNNoise they dump from Keras model quantized weight fp32 to c++ int8 weght header. after that in neural calculation step they revert int8 -> fp32 by dividing 128 so how they speeding up is not focusing on quntazaition but applying vectorization(instruction level parallelism) in neural calculation in c++

so the biggest reason that processing times takes quite long now on PercepNet is because Vectoriazaion is not applied yet. By Changing CMakeLists.txt on this repo you can apply Vectorizaion which means you can make Processing faster If you have some understanding about CMake build System feel free to make Pull Request Thanks!

Oct 12 '21 03:10 jzi040941

Thank you for your explanation!

Oct 12 '21 06:10 artsokol

PercepNet PercepNet copied to clipboard

Quantisation

PercepNet
PercepNet copied to clipboard