FunASR
FunASR copied to clipboard
A10卡GPU推理效率和CPU持平,不清楚是什么地方的问题
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
❓ Questions and Help
Before asking:
- search the issues.
- search the docs.
What is your question?
参考:https://github.com/modelscope/FunASR/blob/e8f535f53320780cd8ed6f3b8588b187935d3ae5/runtime/onnxruntime/readme.md 编译出onnxruntime的二进制版本,也打开了GPU=ON
开启量化后的合成效果加速比最大只有300左右,和CPU版本非常接近。看GPU利用率确实也有70%左右,这个是为什么呢。
Code
编译命令: cmake -DCMAKE_BUILD_TYPE=release .. -DONNXRUNTIME_DIR=/home/ubuntu/github/FunASR/onnxruntime-linux-x64-1.14.0 -DFFMPEG_DIR=/home/ubuntu/github/FunASR/ffmpeg-master-latest-linux64-gpl-shared -DGPU=on
模型导出方式:
funasr-export ++model=damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch ++quantize=true ++device=cuda ++type=torchscript
推理命令:
funasr-onnx-offline-rtf --model-dir /home/ubuntu/.cache/modelscope/hub/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --vad-dir /home/ubuntu/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch --punc-dir /home/ubuntu/.cache/modelscope/hub/damo/punc_ct-transformer_cn-en-common-vocab471067-large --gpu --thread-num 20 --batch-size 48 --quantize true --wav-path ./test100.scp
和
What have you tried?
What's your environment?
- OS (e.g., Linux):
- FunASR Version (e.g., 1.0.0):
- ModelScope Version (e.g., 1.11.0):
- PyTorch Version (e.g., 2.0.0):
- How you installed funasr (
pip, source): - Python version:
- GPU (e.g., V100M32)
- CUDA/cuDNN version (e.g., cuda11.7):
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
- Any other relevant information:
python=3.8 funasr、modelscope都是最新的
GPU部署请参考 https://github.com/modelscope/FunASR/blob/main/runtime/docs/SDK_advanced_guide_offline_gpu_zh.md
因为paraformer里的cif模块,动态循环在gpu上会非常慢