wenet icon indicating copy to clipboard operation
wenet copied to clipboard

Context biasing based on Aho–Corasick

Open victor45664 opened this issue 2 years ago • 6 comments

image

I fixed the bug described here. This PR can increase hotword recognition accuracy but it may also increase false positive. The optimal context_score should be around 3. But please adjust context_score according to your own develop set. And the RTF is also lower 0.3569-> 0.33.

victor45664 avatar May 25 '22 12:05 victor45664

Any result about the accuracy?

robin1001 avatar May 25 '22 12:05 robin1001

Any result about the accuracy?

wer 23.1 -> 20.01 on our private hotword dataset.

victor45664 avatar May 25 '22 12:05 victor45664

any WER comparison about the AC and WFST?

robin1001 avatar May 25 '22 13:05 robin1001

any WER comparison about the AC and WFST?

The WER without context biasing is 33 . WFST is 23.1 while AC is 20.01.

victor45664 avatar May 25 '22 13:05 victor45664

how many context phrases in your testing?

robin1001 avatar May 25 '22 13:05 robin1001

how many context phrases in your testing?

1216

victor45664 avatar May 25 '22 13:05 victor45664

感谢支持,不好意思,这个 PR 在这里耽搁了太久了,这个是我们的内部实现,https://github.com/wenet-e2e/wenet/pull/1937,基于 FST 框架实现,和 wenet 更吻合一些,目前已合并。这个工作差不多也是和你这个同期做好的,但因为论文中要用,所以一直没有提上来,和你这个 PR 功能比较类似,所以没有持续跟进你这个 PR。

robin1001 avatar Sep 01 '23 03:09 robin1001