wenet Context biasing based on Aho

Context biasing based on Aho–Corasick

Open victor45664 opened this issue 2 years ago • 6 comments

I fixed the bug described here. This PR can increase hotword recognition accuracy but it may also increase false positive. The optimal context_score should be around 3. But please adjust context_score according to your own develop set. And the RTF is also lower 0.3569-> 0.33.

May 25 '22 12:05 victor45664

Any result about the accuracy?

May 25 '22 12:05 robin1001

Any result about the accuracy?

wer 23.1 -> 20.01 on our private hotword dataset.

May 25 '22 12:05 victor45664

any WER comparison about the AC and WFST?

May 25 '22 13:05 robin1001

any WER comparison about the AC and WFST?

The WER without context biasing is 33 . WFST is 23.1 while AC is 20.01.

May 25 '22 13:05 victor45664

how many context phrases in your testing?

May 25 '22 13:05 robin1001

how many context phrases in your testing?

1216

May 25 '22 13:05 victor45664

感谢支持，不好意思，这个 PR 在这里耽搁了太久了，这个是我们的内部实现，https://github.com/wenet-e2e/wenet/pull/1937，基于 FST 框架实现，和 wenet 更吻合一些，目前已合并。这个工作差不多也是和你这个同期做好的，但因为论文中要用，所以一直没有提上来，和你这个 PR 功能比较类似，所以没有持续跟进你这个 PR。

Sep 01 '23 03:09 robin1001

wenet wenet copied to clipboard

Context biasing based on Aho–Corasick

wenet
wenet copied to clipboard