Cognitive-Services-Voice-Assistant icon indicating copy to clipboard operation
Cognitive-Services-Voice-Assistant copied to clipboard

UWPVA should use KeywordRecognizer

Open trrwilson opened this issue 4 years ago • 0 comments

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [X] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Overview

v1.12 of the Speech SDK added support for a new KeywordRecognizer. This provides a way for applications to perform keyword spotting prior to authenticating with the speech service and will greatly improve cold start latencies--especially in real-world environments that involve token retrieval from an intermediate source.

The UWP Voice Assistant sample app, most specifically the DirectLineSpeechDialogBackend, should integrate this new KeywordRecognizer functionality to demonstrate its use in a easily reused way.

As a high-level summary of the work involved:

  • The backend initialization should no longer initialize a DialogServiceConnector immediately. Instead, it should create a KeywordRecognizer.
  • An audio turn start with confirmation required should plumb the input audio into the KeywordRecognizer (via the same sink currently used directly by the connector)
  • Confirmation timeouts should be tied to this new KeywordRecognizer rather than the connector
  • Upon confirmation (recognized event), the connector should be just-in-time initialized, an AudioDataStream should be retrieved from the KeywordRecognitionResult, and the stream data should be injected into the connector via a semi-persistent adapter object (AudioDataStream cannot currently work independently as a stream input source)
  • Everything else should generally then work the same way!

trrwilson avatar Jun 18 '20 16:06 trrwilson