web-speech-cognitive-services
web-speech-cognitive-services copied to clipboard
Integration with React Speech Recognition
Hi @compulim ! I'm the author of the React Speech Recognition hook. I recently made a new release that supports polyfills such as yours. Indeed, yours is currently the first and only polyfill that works (more or less) with react-speech-recognition
. You've done a great job - for the most part, it worked smoothly while I was testing the two working together.
Some feedback on some wrinkles I encountered while testing the integration between the two libraries:
- I think this is an issue in the underlying Cognitive Services SDK, but I found that providing a subscription key rather than an authorization token resulted in authorization errors from Azure. I also found this to be the case in your playground. With that in mind, it would be cool if your polyfill handled the conversion of subscription keys to authorization tokens if your consumers provide them. The token endpoint seems pretty stable so you could make a good guess at it (
https://${REGION}.api.cognitive.microsoft.com/sts/v1.0/issuetoken
) if the consumer didn't provide it themselves. You could also handle the caching of the authorization tokens. - I think this was raised in another GitHub issue, but the Speech Recognition events your polyfill emits don't set
resultIndex
.react-speech-recognition
makes use of this while managing the transcript. This could be set toresults.length - 1
, which is what I did as a consumer. - There seems to be a race condition where sometimes when calling
stop
, a "final" result that was emitted before the stop gets emitted a second time. I say race condition because I wasn't able to reproduce this consistently. It doesn't happen with the Chrome browser Speech Recognition engine. I was able to find a workaround, but would be nice to get this fixed. - Perhaps related to the previous point, but I found that when stopping and immediately restarting the polyfill on Firefox or Safari, it would become unresponsive. I do this when changing languages. Hard to tell what's going on, but again I assume a race condition somewhere.
- Azure returns 400 responses if no language is explicitly set by the polyfill consumer - it looks like the polyfill uses the language from the DOM by default, which is not always a valid Azure language code.
- I'm not sure if this is a solvable problem, but I found the need to wrap the polyfill setup in an async function a bit cumbersome. I'm not totally convinced it's necessary - under the hood, it looks like most of your async logic actually happens when the polyfill consumer asks it to start listening rather than when the polyfill is instantiated. The polyfill consumer still has to perform some async logic to get the authorization token - however, as mentioned above, the polyfill could do this work for the consumer, with that logic potentially being run once on the first call to
start
or in the background on creation.
Thanks for making this polyfill and I hope some of the above is useful. If you want to donate more of your speech recognition polyfill-making skills, there is a similar WIP project for AWS Transcribe that I'd love to be able to integrate with. There's also a general discussion about web speech recognition polyfills here.