PaddleSpeech ASR多音字问题

使用流式ASR服务时，语音识别效果差，如识别结果为“晴初是身份证”，这种问题应该怎么优化呢，换模型吗？调用方式如下： from paddlespeech.server.bin.paddlespeech_server import ServerExecutor streaming_asr_server = ServerExecutor() streaming_asr_server(config_file=args.config_file, log_file=args.log_file)

Jun 06 '23 05:06 NLPerxue

可以尝试更换模型或者进行模型finetune。如果仅为多音字问题也可尝试增加语言模型。

Jun 06 '23 06:06 zxcd

感谢！增加模型是同时使用多个模型吗？请问有没有这样的demo？

Jun 06 '23 09:06 NLPerxue

语言模型的介绍可以看这里https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/asr/ngram_lm.md?plain=1 已有的模型可以看这里https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/paddlespeech/resource/pretrained_models.py 如需使用语言模型可以尝试添加到类似这里https://github.com/PaddlePaddle/PaddleSpeech/blob/8aa9790c7518e7857fd2b8a894284cc24a9de51a/paddlespeech/resource/pretrained_models.py#LL371C6-L371C6

Jun 08 '23 02:06 zxcd

语言模型的介绍可以看这里https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/asr/ngram_lm.md?plain=1 已有的模型可以看这里https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/paddlespeech/resource/pretrained_models.py 如需使用语言模型可以尝试添加到类似这里https://github.com/PaddlePaddle/PaddleSpeech/blob/8aa9790c7518e7857fd2b8a894284cc24a9de51a/paddlespeech/resource/pretrained_models.py#LL371C6-L371C6 谢谢！

Jun 09 '23 01:06 NLPerxue

语言模型的介绍可以看这里https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/asr/ngram_lm.md?plain=1 已有的模型可以看这里https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/paddlespeech/resource/pretrained_models.py 如需使用语言模型可以尝试添加到类似这里https://github.com/PaddlePaddle/PaddleSpeech/blob/8aa9790c7518e7857fd2b8a894284cc24a9de51a/paddlespeech/resource/pretrained_models.py#LL371C6-L371C6

语言模型如何在推理时生效呢，换成了相应的模型，下载了语言模型，在预测时好像没有使用 @zxcd

Jun 13 '23 08:06 yq-xfl

语言模型的介绍可以看这里https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/asr/ngram_lm.md?plain=1 已有的模型可以看这里https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/paddlespeech/resource/pretrained_models.py 如需使用语言模型可以尝试添加到类似这里https://github.com/PaddlePaddle/PaddleSpeech/blob/8aa9790c7518e7857fd2b8a894284cc24a9de51a/paddlespeech/resource/pretrained_models.py#LL371C6-L371C6

想问一下，训练这个语言模型的文本，分词词典是用的jieba默认的词典吗，还是百度自己的

Aug 17 '23 07:08 wwfcnu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Oct 15 '23 14:10 stale[bot]

This issue is closed. Please re-open if needed.

Apr 27 '25 18:04 stale[bot]

This issue is closed. Please re-open if needed.

Jun 27 '25 03:06 stale[bot]