PercepNet
PercepNet copied to clipboard
Quantisation
Hello! I found that weights are dumped in float32 format. It significantly impact the speed of inference of model on CPU. In my test 10 ms are being processed 25 ms. Would you please clarify if you tested quantisation of weights like it implemented in rnnoise model?
Hello, sorry for late reply quantization is not tested on Percepnet yet. I think you can apply quantization very easily according to official Pytorch Quantization Document https://pytorch.org/docs/stable/quantization.html
But in RNNoise they dump from Keras model quantized weight fp32 to c++ int8 weght header. after that in neural calculation step they revert int8 -> fp32 by dividing 128 so how they speeding up is not focusing on quntazaition but applying vectorization(instruction level parallelism) in neural calculation in c++
so the biggest reason that processing times takes quite long now on PercepNet is because Vectoriazaion is not applied yet. By Changing CMakeLists.txt on this repo you can apply Vectorizaion which means you can make Processing faster If you have some understanding about CMake build System feel free to make Pull Request Thanks!
Thank you for your explanation!