FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

热词模型是否会降低识别效果

Open ShenJun77 opened this issue 1 year ago • 0 comments

使用/runtime/python/websocket下的代码进行测试,mode为2pass,在使用过程中发现支持热词的离线模型在总体识别效果上弱于非热词模型,热词模型更容易出现“嗯”“啊”之类的语气词,非热词模型会纠正掉,请问热词模型是否会降低识别效果?如有,应该如何解决?

asr_model_online:

  • speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online

vad_model:

  • speech_fsmn_vad_zh-cn-16k-common-pytorch

punc_model:

  • punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727

离线模型:

  • speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch

热词模型:

  • speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
  • speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404

What's your environment?

  • OS (e.g., Linux): Linux
  • FunASR Version (e.g., 1.0.0): 1.0.19
  • PyTorch Version (e.g., 2.0.0): 1.13.1
  • How you installed funasr (pip, source): pip
  • Python version: 3.10.10

ShenJun77 avatar Jul 03 '24 08:07 ShenJun77