sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

ASR English model cannot recognize my voice well

Open jims57 opened this issue 9 months ago • 14 comments

why all the english related model in asr cannot recognize well? though Chinese ones seem work well.

Have you tested the english related models? Am i missing anything to make it accurate? thanks.

e.g. this model:sherpa-ncnn-streaming-zipformer-20m-2023-02-17

jims57 avatar Mar 01 '25 07:03 jims57

I don't think it is about my english tone, because even i use the standard english audio, it doesn't work well. the recognized result seem bad.

jims57 avatar Mar 01 '25 07:03 jims57

please list all.models.you have tested

You only list one.model.

csukuangfj avatar Mar 01 '25 09:03 csukuangfj

i have test all the models listed sherpa, i don't think any of them is good enough to recognize english, though chinese one is good

jims57 avatar Mar 09 '25 03:03 jims57

btw, please see : https://k2-fsa.github.io/icefall/model-export/export-ncnn-conv-emformer.html#export-the-model-via-torch-jit-trace

Image

jims57 avatar Mar 09 '25 03:03 jims57

i have test the model in both onnx and ncnn examples for swift, they cannot recognize English well, i can say it is very bad, WER is at least 50%, am i missing anything? can you try the model using the swift sample for a try? thanks.

jims57 avatar Mar 09 '25 03:03 jims57

【新一代 Kaldi: 最新版 zipformer 在 iOS 上的英文语音识别演示-哔哩哔哩】 https://b23.tv/K87JkfW

【新一代 Kaldi: Two-pass 实时英语语音识别之 iOS (奥巴马演讲)-哔哩哔哩】 https://b23.tv/YS2QNmR

We have already provided Engine asr video demos

If it is not working for you, please recheck.

csukuangfj avatar Mar 09 '25 03:03 csukuangfj

@csukuangfj I have got same error. When run flutter streaming_asr example using english model like sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2.

they cannot recognize English when stop and start record, watch video demo error Watch on youtube

docaohuynhcse avatar Apr 22 '25 12:04 docaohuynhcse

please.use a.larger model

csukuangfj avatar Apr 22 '25 13:04 csukuangfj

I try sherpa-onnx-streaming-zipformer-en-2023-06-21. seem like it work. But problem is this model too large for mobile. I try Chinese language with small model asr-models/sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23 It still working. Can you make English model working with smaller size

docaohuynhcse avatar Apr 22 '25 14:04 docaohuynhcse

Can you tell us how large you think is too large for mobile?

csukuangfj avatar Apr 22 '25 14:04 csukuangfj

Model sherpa-onnx-streaming-zipformer-en-2023-06-21 ~ 500M. So It make App size > 600M. I think App with that size is big.

docaohuynhcse avatar Apr 22 '25 14:04 docaohuynhcse

Model sherpa-onnx-streaming-zipformer-en-2023-06-21 ~ 500M

No, we don't have model files that large.

Please keep only the files you use in the code and remove other unused files.

There is no need to keep unused files in your app.

csukuangfj avatar Apr 22 '25 14:04 csukuangfj

I suggest that you have a look at the Flutter apps we provide

https://k2-fsa.github.io/sherpa/onnx/flutter/pre-built-app.html#streaming-speech-recognition-stt-asr

Are our APPs that large?

csukuangfj avatar Apr 22 '25 14:04 csukuangfj

Please keep only the files you use in the code and remove other unused files.

Thank you very much! Now I know how to make App size smaller

docaohuynhcse avatar Apr 22 '25 16:04 docaohuynhcse