Cognitive-Services-Voice-Assistant
Cognitive-Services-Voice-Assistant copied to clipboard
UWPVA should use KeywordRecognizer
This issue is for a: (mark with an x
)
- [ ] bug report -> please search issues before submitting
- [X] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Overview
v1.12 of the Speech SDK added support for a new KeywordRecognizer
. This provides a way for applications to perform keyword spotting prior to authenticating with the speech service and will greatly improve cold start latencies--especially in real-world environments that involve token retrieval from an intermediate source.
The UWP Voice Assistant sample app, most specifically the DirectLineSpeechDialogBackend, should integrate this new KeywordRecognizer functionality to demonstrate its use in a easily reused way.
As a high-level summary of the work involved:
- The backend initialization should no longer initialize a DialogServiceConnector immediately. Instead, it should create a KeywordRecognizer.
- An audio turn start with confirmation required should plumb the input audio into the KeywordRecognizer (via the same sink currently used directly by the connector)
- Confirmation timeouts should be tied to this new KeywordRecognizer rather than the connector
- Upon confirmation (recognized event), the connector should be just-in-time initialized, an
AudioDataStream
should be retrieved from theKeywordRecognitionResult
, and the stream data should be injected into the connector via a semi-persistent adapter object (AudioDataStream
cannot currently work independently as a stream input source) - Everything else should generally then work the same way!