SenseVoice icon indicating copy to clipboard operation
SenseVoice copied to clipboard

SenseVoiceSmall can not recognize English abbreviations in Japanese language speech

Open jason-ni opened this issue 1 year ago • 0 comments

🐛 Bug

When I use SenseVoiceSmall to ASR on an Japanese news audio, it's found that all English abbreviations are ignored. Bellow are comparrations of SenseVoiceSmall output and whisper.cpp output:

SenseVoice

今回のジャパンモビリティーショでは様々な電気自動車が展示されました 例えば中国のは4つの車輪それぞれに独立したモーターをつけた車を発表

whisper.cpp

[00:00:00.600 --> 00:00:06.540]  今回のジャパンモビリティショーでは様々なEV、電気自動車が展示されました
[00:00:06.540 --> 00:00:14.720]  例えば中国のBYDは4つの車輪それぞれに独立したモーターをつけた車を発表

To Reproduce

Audio comes from this youtube video: https://www.youtube.com/watch?v=EdvILOSgPzY

Expected behavior

Could you clearify is it a missing capability of the model itself or it's bug of decoding tokens from model output?

Environment

I tried locally and also on the https://www.modelscope.cn/studios/iic/SenseVoice demo. Results are all the same.

jason-ni avatar Aug 04 '24 23:08 jason-ni