wenet
wenet copied to clipboard
Context biasing based on Aho–Corasick
I fixed the bug described here. This PR can increase hotword recognition accuracy but it may also increase false positive. The optimal context_score should be around 3. But please adjust context_score according to your own develop set. And the RTF is also lower 0.3569-> 0.33.
Any result about the accuracy?
Any result about the accuracy?
wer 23.1 -> 20.01 on our private hotword dataset.
any WER comparison about the AC and WFST?
any WER comparison about the AC and WFST?
The WER without context biasing is 33 . WFST is 23.1 while AC is 20.01.
how many context phrases in your testing?
how many context phrases in your testing?
1216
感谢支持,不好意思,这个 PR 在这里耽搁了太久了,这个是我们的内部实现,https://github.com/wenet-e2e/wenet/pull/1937,基于 FST 框架实现,和 wenet 更吻合一些,目前已合并。这个工作差不多也是和你这个同期做好的,但因为论文中要用,所以一直没有提上来,和你这个 PR 功能比较类似,所以没有持续跟进你这个 PR。