FunASR
FunASR copied to clipboard
How to chose quantizing value when exporting? 16 bit vs 8 bit (for streaming)
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
❓ Questions and Help
Before asking:
- search the issues.
- search the docs.
What is your question?
I am trying to quantize a streaming model which has a decoder.onnx and a model.onnx. Is there a way to change how it quantizes (8 vs 16) to have a larger or smaller model_quant.onnx and decoder_quant.onnx
Code
What have you tried?
I tried the onnxruntime.quantizeation quantize_dynamic() but it doesn't work when there is a model and decoder. I have also looked into the funasr model.export() function but couldn't find an option for changing quantization.
What's your environment?
- OS (e.g., Linux):
- FunASR Version (e.g., 1.0.0):
- ModelScope Version (e.g., 1.11.0):
- PyTorch Version (e.g., 2.0.0):
- How you installed funasr (
pip, source): - Python version:
- GPU (e.g., V100M32)
- CUDA/cuDNN version (e.g., cuda11.7):
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
- Any other relevant information: