SpeechToText-WebSockets-Javascript icon indicating copy to clipboard operation
SpeechToText-WebSockets-Javascript copied to clipboard

Sample no longer works with Custom Speech service after //BUILD 2018 product updates

Open mikebranstein opened this issue 6 years ago • 20 comments

The JavaScript SDK only works with Bing Speech API endpoints. Custom Speech endpoints need to be supported. PR incoming.

mikebranstein avatar May 18 '18 12:05 mikebranstein

PR #82 submitted.

mikebranstein avatar May 18 '18 15:05 mikebranstein

Hi,

I have used the sample and changed the URI to wss://westus.stt.speech.microsoft.com in the speechConnectionFactory.js. I kept getting error 403 Forbidden.

May I know what should be the URI?

Thanks in Advance.

mageshpurpleslate avatar May 23 '18 14:05 mageshpurpleslate

@mageshpurpleslate it depends. If you're using the Bing Speech service, nothing needs to change. If you're going to use the Custom Speech Service, you need to append an endpoint Id to the URI. Check out PR #82 for the details on everything that needs to change.

mikebranstein avatar May 23 '18 14:05 mikebranstein

Mike,

Thanks a lot for your help. Works like a charm. Pretty nicely done.

Is this acceptable to send the API subscription key in the query parameter? Are you planning to do any changes to that?

Regards,

Magesh

mageshpurpleslate avatar May 24 '18 03:05 mageshpurpleslate

@mageshpurpleslate I'm glad this worked for you - I recall starting out with Bing and Custom Speech ~ a year ago and the samples were pretty rough.

There are 2 ways to authenticate to the speech services with WebSockets. The first is using the the query string format. It's acceptable to send it that way because it's over HTTPS (WSS). The second way to authenticate is to pre-authenticate with an HTTP POST to the Cognitive Services secure token service. This returns a bearer token that is added to the WebSocket connection header. Docs on how to do this is here.

mikebranstein avatar May 24 '18 23:05 mikebranstein

@mikebranstein - I tried the custom speech implementation with your proposed code changes. But i am getting "403 Forbidden error" in the WSS call. The path i have copied from F12 dev tools looks like:

wss://westus.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?cid=https://westus.api.cognitive.microsoft.com/sts/v1.0&format=simple&language=en-US&Ocp-Apim-Subscription-Key=<...... key......>&X-ConnectionId=<..... connection id .... >

Is this a valid path formation, have you ever faced 403 error during your testing ?

mraguraman3 avatar Jul 09 '18 07:07 mraguraman3

@mraguraman3 I believe you are placing the entire endpoint URL in the "Custom Speech Endpoint ID" textbox. Instead, use the Endpoint ID, which is a GUID. You can find the endpoint ID on the custom speech portal.

mikebranstein avatar Jul 09 '18 12:07 mikebranstein

Thanks a lot @mikebranstein , its working now after placing the endpoint ID.

Also you mentioned that we can use token based authentication, so in this case we don't need to pass endpoint ID in HTTP Post header , just the subscription key is enough to generate the token ?

mraguraman3 avatar Jul 10 '18 07:07 mraguraman3

@mraguraman3 - yes, token-based auth is also available. I did not use token-based auth because the original solution used the query string parameter auth. I wanted to augment the solution in a specific way for this PR. A different PR would be necessary to change the auth.

mikebranstein avatar Jul 10 '18 12:07 mikebranstein

Thanks @mikebranstein .

Anyways i can confirm token based auth is not working with the custom speech implementation.

Though the token is generated using the subscription ID, I am getting 401 Unauthorized when hitting the web sockets. Below is the wss call format:

wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#&format=simple&language=en-US&Authorization=#token#

mraguraman3 avatar Jul 12 '18 13:07 mraguraman3

@mraguraman3 token based auth does work, but it's tricky. I have a C# SDK I had to roll for the Custom Speech Service websocket speech protocol before Microsoft released their own.

mikebranstein avatar Jul 12 '18 13:07 mikebranstein

Great @mikebranstein, but i am using Javascript Node App to generate the token. Anyways, all i want to know is whether this is a valid wss call format or am i missing some thing ?

wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#&format=simple&language=en-US&Authorization=#token#

mraguraman3 avatar Jul 13 '18 05:07 mraguraman3

@mraguraman3 the C# SDK was released after //BUILD this year. You can find it on NuGet: https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech/.

mikebranstein avatar Jul 13 '18 10:07 mikebranstein

Thanks @mikebranstein , but i don't think there is an option in this SDK to provide the endpoint ID for custom speech.

Only EndpointURL is supported which i believe is the actual http host for speech service. Here is the documentation of supported properties in C# sdk:

https://docs.microsoft.com/en-gb/dotnet/api/microsoft.cognitiveservices.speech.speechfactory?view=azure-dotnet

Do you have any plans to support token auth in "SpeechToText-WebSockets-Javascript" for custom speech ?

mraguraman3 avatar Jul 16 '18 08:07 mraguraman3

@mraguraman3 from what I understand, the EndpointURL property is part of the URL. So, the EndpointURL for custom speech could be wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#.

Underneath the SDK the streaming protocol supported is the Speech Service WebSocket protocol, outlined here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/websocketprotocol.

If you were going to implement the speech protocol yourself, you'd have to request an auth token using your subscription id (like this code snippet below).

 private async Task<string> FetchToken()
{
  using (var client = new HttpClient())
  {
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "<Subscription Id>");
    UriBuilder uriBuilder = new UriBuilder("https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken");

    var result = await client.PostAsync(uriBuilder.Uri.AbsoluteUri, null);
    return await result.Content.ReadAsStringAsync();
  }
}

When you have that token, you can use a ClientWebSocket and set the Authorization bearer token on the web socket connection. Assuming _cws is the client web socket:

var authToken = await FetchToken();
_cws.Options.SetRequestHeader("Authorization", $"Bearer {authToken}");

In review of the JavaScript SDK, it supports auth token connections. The sample HTML does not use it, but you can modify the sample code slightly to take advantage of the auth token approach. See https://github.com/Azure-Samples/SpeechToText-WebSockets-Javascript/blob/477067fe264159e7ccbd233a01015f9ea03a6d06/samples/browser/Sample.html#L166. I believe you can change this value to true and your solution would use the auth token approach.

mikebranstein avatar Jul 16 '18 14:07 mikebranstein

Hi @mikebranstein, I am here to check one more item with you. Is there a way, we can save the clip, while it is being sent for recognition as well? We are trying to save it for auditing purposes.

mageshpurpleslate avatar Aug 02 '18 07:08 mageshpurpleslate

@mageshpurpleslate there's no native SDK way of doing this (to my knowledge), so you'd have to write the code to do this. For example, you could write a middle layer that collects the audio from a microphone, then funnels it to your desired location, then writes the same stream to the Speech SDK. If you don't want to do that client-side with JavaScript, then you could host your own WebSocket app that uses the C# Speech SDK. Your websocket app would act as the middle layer, intercepting the audio stream. I have a solution that does this that is hosted as a Service Fabric Web Socket app in Azure.

mikebranstein avatar Aug 02 '18 12:08 mikebranstein

@mageshpurpleslate After thinking for a few more minutes, the C# SDK has a custom audio source/stream you can create. You could create one that audits the audio bytes as they are being fed to the service via the SDK.

mikebranstein avatar Aug 02 '18 12:08 mikebranstein

@mikebranstein thank you. This would help. I will try it out.

mageshpurpleslate avatar Aug 02 '18 16:08 mageshpurpleslate

wss://westus.stt.speech.microsoft.com is working for latest speech api.

hellowonders avatar Dec 11 '18 10:12 hellowonders