kaldi-active-grammar icon indicating copy to clipboard operation
kaldi-active-grammar copied to clipboard

Non-exported nested rule references recognized as top level rules

Open ileben opened this issue 4 years ago • 4 comments

In the following example i would expect nothing to happen, unless i prepend a letter keyword with "spell". However if i just say "alpha" I get the printout "RECOGNIZED SPELLING". You can see that in this example the nested referenced rule Alphabet is not exported. _spelling.txt

For the sake of reproduction, I tested this by running: python -m dragonfly load _spelling.py --engine kaldi -o vad_padding_end_ms=300 --no-recobs-messages

I have tried the same example with the test engine: python -m dragonfly test _spelling.py --delay 0.1 and it worked as expected - nothing is recognized unless i prepended it with "spell", which makes me think this issue is caused by the Kaldi backend.

Name: dragonfly2 Version: 0.24.0

Name: kaldi-active-grammar Version: 1.4.0

Python 2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)] on win32

ileben avatar Jul 12 '20 05:07 ileben

I think this is working as designed. It is not that the Alphabet rule is being exported, but rather that the Spelling rule is being recognized. You can see this by enabling the recognition observers or the engine logging, and by the "RECOGNIZE SPELLING". This is because, with only the Spelling rule exported, you are effectively telling the engine that anything/everything it hears will be of the form "spell ", so it interprets the slightest random noise in your utterance as "spell". A possible mitigation for this is in development here: https://github.com/dictation-toolbox/dragonfly/pull/258. Please let me know of any further trouble.

daanzu avatar Jul 13 '20 15:07 daanzu

@ileben Oh, I forgot to mention another easy thing to do: Add a global catch-all Dictation Rule with a no-op Action.

grammar.add_rule(MappingRule(
    name = 'noise sink',
    mapping = {
        '<dictation>': ActionBase(),
        },
    extras = [ Dictation("dictation") ],
    ))

daanzu avatar Jul 14 '20 02:07 daanzu

A confidence threshold for recognitions sounds like the most sensible solution to this problem to me. However i can imagine it's functionally equivalent to having the engine constantly decide whether a command was spoken or anything else based on some confidence level. I like the idea of being able to control this threshold via an engine parameter, so I tried to install the fix from the linked pull request, but i ran into this https://github.com/daanzu/kaldi-active-grammar/issues/29

ileben avatar Jul 14 '20 09:07 ileben

Neither of the two solutions has the desired effect.

The newly added engine option expected_error_rate_threshold seems to have no effect.

The noise sink solution behaves erratically: if i say "alpha" it still recognizes the spelling rule, but if i say "bravo" it recognizes dictation. Anything prepended by "spell" is correctly recognized as the spelling rule, but any combination of multiple letters (eg "bravo charlie") is also recognized as spelling rule regardless of whether it's prepended by "spell".

ileben avatar Jul 15 '20 13:07 ileben