dragonfly
dragonfly copied to clipboard
Engine hook for invalidating recognitions
The ability to invalidate the current recognition before it is processed would be useful in certain scenarios. For instance, this could be used to pretend the recognition failed if the window handle changed between speech start (on_begin()
) and speech end (just before on_recognition()
is called).
I have in mind an engine hook that returns whether or not the recognition should be invalidated, i.e. a predicate function. If True
is returned, this should make it so the on_failure()
recobs callback is invoked instead of on_recognition()
. Given that this is changing recognition state, rather than merely observing it, I think it makes more sense to add this as an engine method instead of an additional recobs callback.
It should be noted that with Dragon and WSR, we can only pretend the recognition was a failed one; the results boxes will still show the words. Additionally, recognitions received from these engine back-ends not directed at any specific Dragonfly grammar should be ignored for this predicate, since we are not really processing them.
Using the above example, here is what I have in mind:
from dragonfly import Window, get_engine, register_beginning_callback
# Store the window handle at the start of each utterance.
_on_begin_window_handle = None
def on_begin():
global _on_begin_window_handle
_on_begin_window_handle = Window.get_foreground().handle
# Register the on_begin() callback.
register_beginning_callback(on_begin)
# Invalidate the recognition if the handle changed.
def invalidation_hook():
if _on_begin_window_handle is None:
return False # Don't invalidate.
# Return whether or not the recognition should be invalidated.
handle = Window.get_foreground().handle
return handle != _on_begin_window_handle
# Set the invalidation hook.
get_engine().set_recognition_invalidation_hook(invalidation_hook)
I haven't yet investigated how feasible this is to implement for each engine. I would be interested in what you think @daanzu.
That does sound like it could be handy. Actually, I already implemented something similar to that in the county backend: get_engine().ignore_current_phrase()
. Adding support for this proposal would not be hard. And as you can probably tell from my API decisions, I am always a big fan of adding predicates and callbacks!
Ah, nice! I had forgotten about that method. The Sphinx engine has something similar: get_engine().cancel_recognition()
. I believe that Sphinx method wouldn't work with mimic()
. Good to know this wouldn't be too difficult for Kaldi or Sphinx.
Yes, hooks, predicates and callbacks are nice. :-)
I think perhaps they are more appropriate in this case, since you would have to call ignore_current_phrase()
after on_start()
and before on_recognition()
. You wouldn't need to worry about that with this approach.
Definitely this proposed approach is more flexible, and I am in favor of implementing it. However it may be nice to still have the simpler method available as well, so the user doesn't have to track the state of recognition through recognition observers if they don't otherwise need to.
Agreed, having both sounds good to me. With that in mind, what do you think the name of the simpler method should be?
Sounds good. I'm not sure what name is best for the simple method: ignore_current_phrase
made sense to me at the time, but I'm not entirely happy with it. I was trying to get across that it doesn't actually cancel or change anything in all of the processing (it doesn't start a new phrase for instance), but only ignores the eventual result with regard to actions.
Hmm okay then. I think invalidate_current_phrase()
might be better. It seems more consistent to invalidate the recognition and notify observers of that via on_failure()
, rather than just doing no processing.
Thinking about it, this inconsistency is also present with the Sphinx engine training mode. That should probably be removed altogether. There are better ways to implement it anyway.
Sorry for the late reply.
Looking back over this issue today, it occurs to me that Dragonfly users have been able to do this for a long time via the special Grammar.process_recognition()
callback documented here. A Grammar
subclass implementing that callback may return False
if the final window context doesn't match the starting one. Doing so will prevent rule processing.
That grammar callback has been working correctly for Dragon and WSR users for ages. It will work properly for all engine back-ends, including Kaldi, in the next release -- v1.0.0.
Looking back over this issue today, it occurs to me that Dragonfly users have been able to do this for a long time via the special
Grammar.process_recognition()
callback documented here. AGrammar
subclass implementing that callback may returnFalse
if the final window context doesn't match the starting one. Doing so will prevent rule processing.That grammar callback has been working correctly for Dragon and WSR users for ages. It will work properly for all engine back-ends, including Kaldi, in the next release -- v1.0.0.
Could this be used a as part of a mechanism for canceling mid utterance?
As part of such a mechanism, yes. Stopping rule processing would be good enough for every engine except Dragon and WSR (sapi5shared) -- they both have built-in grammar rules.
For Dragon, you can cancel recognition from the results box menu. As far as I know, you can't do that through Natlink. I don't know about WSR.