SenseVoice sensevoice-onnx模型每次识别新的（之前没有见过的）输入语音都要加载10几秒，很影响推理效率，这个问题如何解决？

Notice: In order to resolve issues more efficiently, please raise issue following the template. （注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节）

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

Run cmd '....'
See error

Code sample

Expected behavior

Environment

OS (e.g., Linux):
FunASR Version (e.g., 1.0.0):
ModelScope Version (e.g., 1.11.0):
PyTorch Version (e.g., 2.0.0):
How you installed funasr (pip, source):
Python version:
GPU (e.g., V100M32)
CUDA/cuDNN version (e.g., cuda11.7):
Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
Any other relevant information:

Additional context

Oct 10 '24 02:10 Nicksooooo

不只是启动模型的第一次输入语音，而是每次输入新的语音后，都要加载模型10几秒，很影响效率，如何解决

Oct 10 '24 02:10 Nicksooooo

请教一下onnx模型推理如何指定gpu？我按照model = SenseVoiceSmall(model_dir, batch_size=10, quantize=False, disable_update=True, device_id=1)设置device_id参数并查看nvtop，貌似设置无效。

Dec 17 '24 02:12 Jimmy-L99

Feel free to reference this repo. It is an end-to-end version that includes the STFT process. Simply provide the audio input to obtain the ASR result. You can also customize the model parameters to export a more efficient version than the official one.

If you are using a Windows system, start by running pip install onnxruntime-directml --upgrade and set the execution provider to DmlExecutionProvider for easy GPU usage. Besides, the NVIDIA GPU series, try pip install onnxruntime-gpu --upgrade and set the provider to CUDAExecutionProvider. This should work. Refer to the official documentation for more details on GPU settings.

欢迎参考此仓库。该版本是一个包含 STFT 处理的端到端版本。只需提供音频输入，即可获取 ASR 结果。您也可以自定义模型参数，以导出比官方版本更高效的模型。

如果您使用的是 Windows 系统，可以先运行 pip install onnxruntime-directml --upgrade，并设置为 DmlExecutionProvider，更方便地使用 GPU。此外, NVIDIA 系列的 GPU，可以尝试运行 pip install onnxruntime-gpu --upgrade，并设置为 CUDAExecutionProvider，也应该正常工作。更多 GPU 设置信息，请参考官方文档。

Dec 17 '24 05:12 DakeQQ