azure-signalr icon indicating copy to clipboard operation
azure-signalr copied to clipboard

Server sent events not firing onclose/reconnect event on expiration of JWT

Open Jonno12345 opened this issue 4 years ago • 12 comments

Describe the bug

To test that our reconnection logic on the expiration of the Azure SignalR JWT worked correctly, I set the AccessTokenLifetime to 1 minute.

services.AddSignalR()
                .AddNewtonsoftJsonProtocol()
                .AddAzureSignalR(o =>
                {
                    o.AccessTokenLifetime = TimeSpan.FromMinutes(1);
                });
        

Whilst using WebSockets, as expected, there was no disconnection after 1 minute and the connection remained open for the duration of testing. Whilst using LongPolling, as soon as a 401 error is received from a long poll response, the onclose event is fired allowing us to manage a reconnection.

However, whilst using SSE, the occasional ping or attempt to post a message back to the Azure SignalR server results in a 401 error that does not appear to initiate an onclose or reconnect event. As such, we have no way to know that the communication between the server has now broken, and cannot force it to renegotiate.

This makes SSE functionally unusable with Azure SignalR as our clients will permanently disconnect after the 1 hour JWT timeout.

To Reproduce

As above, using a short JWT expiry and the following javascript connection:

new signalR.HubConnectionBuilder()
            .withUrl(connectionUrl,
                {
                    transport: signalR.HttpTransportType.ServerSentEvents,
                })
            .withAutomaticReconnect() //optional, but neither reconnection nor onclose are occurring
            .build();

Exceptions (if any)

The console fills with the following errors, however no attempt to reconnect trigger an 'onclose' or 'onreconnecting' event occurs.

image

With LongPolling, we receive a single error, followed by a reconnection (onclose without the withAutomaticReconnect())

Further technical details

Jonno12345 avatar Jun 29 '20 16:06 Jonno12345

What is your app server's netcoreapp version or what is the assembly version of Microsoft.AspNetCore.SignalR used in your app server?

When client to server messages are all 401, client=>server keep-alive is broken so server detects client lose and should close client in 15 seconds, and thus trigger reconnect, like below: image

This client=>server ping check is enabled once server received the first client ping message. If you set TTL to 1 minute, it should be enough for server to receive the first client ping and enable the ping check.

There are 2 possibilities in my mind that this ping check is not triggered:

  1. Your app server is running in Debug mode, the check is skipped in Debug mode (https://github.com/dotnet/aspnetcore/blob/3af94f18cb841a27cc9e607f45b880df64201f70/src/SignalR/server/Core/src/HubConnectionContext.cs#L567)

  2. You are using an old version of Microsoft.AspNetCore.SignalR, e.g. 1.0.0, where server does not yet have this "client ping check" feature.

vicancy avatar Jul 02 '20 03:07 vicancy

@vicancy Thanks for your response. Our setup is as follows:

I have a .NET Core 3.1 server using [email protected] and [email protected].

I have published the application to an Azure App Service plan, but running my application that connects to the SignalR hub on the .NET Core 3.1 service in localhost (debugger not attached). It's connecting using the [email protected] JS library from cdnjs.

It would appear that it's our use of the AccessTokenFactory to provide an authentication JWT that is breaking it. When no JWT is provided from our system, the automatic reconnect is working as expected, dying on a bad heartbeat.

image

However, if logged in and we provide a JWT in the AccessTokenFactory, we get the following:

image

I have left it running longer than the duration of our authentication bearer tokens provided in AccessTokenFactory, and it never attempts to renegotiate with the server.

Jonno12345 avatar Jul 02 '20 12:07 Jonno12345

aspnetcore31 should have Microsoft.AspNetCore.SignalR version 3.1 embedded, so please remove the explicit reference of [email protected] and have a retry.

vicancy avatar Jul 03 '20 01:07 vicancy

aspnetcore31 should have Microsoft.AspNetCore.SignalR version 3.1 embedded, so please remove the explicit reference of [email protected] and have a retry.

Thanks, I wasn't aware the nuget package was legacy, I have now removed it, however I can confirm the issue persists.

On further digging, I've realised the JWT in AccessTokenFactory isn't to blame, but slightly different logic we implement when a user is logged in. In certain conditions, we send a message to the server in 5 second intervals. This is I believe happening more frequently than the heartbeat. It would seem that if one of these messages is sent and returns a 401 error, that the heartbeat no longer occurs (presumably due to this https://github.com/dotnet/aspnetcore/blob/d5cf36acc71fe576541bbb8d694a88815c22c800/src/SignalR/clients/ts/signalr/src/HubConnection.ts#L344), and therefore no mechanism exists to disconnect us. It would seem that a 401 from sending a message to the server whilst in SSE mode doesn't trigger the same disconnect logic that a heartbeat would.

This might be for good reason (a legitimate 401 response), however in longpolling when an attempt to poll the server receives a 401, this causes a disconnection allowing us to force a reconnect. It would then seem that the mechanism keeping long polling alive is not the heartbeat in our case, but the attempt to repoll the server. Where this isn't a requirement in SSE, we are 100% reliant on a heartbeat, which is defeated by a client making attempts to send data to the server.

Jonno12345 avatar Jul 03 '20 10:07 Jonno12345

The code you shared is for the client to check server sent ping, what SSE depends on is the reverse direction (server to check client sent ping), so it is not relevant.

If messages sent to the server get 401, it means that the server does not receive these client messages, which means the server does not receive any messages from client in 15 seconds, and the server will stop the connection. (https://github.com/dotnet/aspnetcore/blob/3af94f18cb841a27cc9e607f45b880df64201f70/src/SignalR/server/Core/src/HubConnectionContext.cs#L567)

So it should work.

Is it possible to share with me your mini-project that can repro the issue for me to take a look?

vicancy avatar Jul 06 '20 02:07 vicancy

The code you shared is for the client to check server sent ping, what SSE depends on is the reverse direction (server to check client sent ping), so it is not relevant.

If messages sent to the server get 401, it means that the server does not receive these client messages, which means the server does not receive any messages from client in 15 seconds, and the server will stop the connection. (https://github.com/dotnet/aspnetcore/blob/3af94f18cb841a27cc9e607f45b880df64201f70/src/SignalR/server/Core/src/HubConnectionContext.cs#L567)

So it should work.

Is it possible to share with me your mini-project that can repro the issue for me to take a look?

Hi, sorry for the delay. I've uploaded a small repo here:

https://github.com/Jonno12345/azure-signalr-test

It needs an azure signalr connection string. I ran with 'TestAzureSignalRServer' both deployed to azure, and running without debugger attached (ctrl+f5) and always had the same result, with the constant 'setInterval' firing it would never force a disconnect.

Hopefully there's something very simple I've overlooked. Please let me know if anything doesn't make sense or isn't behaving as I described.

Jonno12345 avatar Jul 08 '20 10:07 Jonno12345

My apologies.

In certain conditions, we send a message to the server in 5 second intervals. This is I believe happening more frequently than the heartbeat. It would seem that if one of these messages is sent and returns a 401 error, that the heartbeat no longer occurs (presumably due to this https://github.com/dotnet/aspnetcore/blob/d5cf36acc71fe576541bbb8d694a88815c22c800/src/SignalR/clients/ts/signalr/src/HubConnection.ts#L344), and therefore no mechanism exists to disconnect us

This is indeed the root cause, that the server-side ping check is not enabled because no client-side ping message is received. I think SignalR server-side should have some options to enable the client-side ping check instead of only relying on PingMessage. I created an issue to track the progress https://github.com/dotnet/aspnetcore/issues/23794

Before the option is provided, one way to mitigate the issue is to send the ping explicitly right after the connection is started:

        connection.start().then(async function () {
            await connection.sendMessage(connection.cachedPingMessage);
            setInterval(...);
        });

Another way not relying on the keep-alive (after all it has 15s delay) is that with 401 error, the client stop and restart the connection proactively.

vicancy avatar Jul 09 '20 05:07 vicancy

@vicancy not at all, thank you for sticking with it. For now, as per your final note, wrapping up our invokes with a catch that immediately initiates a reconnection on 401 may be a smoother experience in general, without as large of a delay.

Jonno12345 avatar Jul 09 '20 09:07 Jonno12345

@Jonno12345 we will update service, it will abort connections with expired token.

vwxyzh avatar Aug 14 '20 04:08 vwxyzh

Not related exactly to this old open issue, but a similar issue does exist where first ping message must be sent from client so the server is tracking timeout and firing the OnDisconnectedAsync(Exception exception).

Basically, if an abrupt disconnect is to happen in the first 15 sec (default keepAlive interval), the OnDisconnected won't be trigger for more than 15 minutes.

A manual mimic of the ping would possibly solve this issue? However, your suggested workaround:

        connection.start().then(async function () {
            await connection.sendMessage(connection.cachedPingMessage);
            setInterval(...);
        });

...does not mimic the ping message correctly and sends a regular type 1 message with cachedPingMessage string value as content. So the effect of the initial keep alive ping with messageType:6 is not happening.

We can see by the issue you linked that a fix is being worked, but until then, and for those not able to upgrade, reducing the keepAliveInterval and therefore the serverTimeout (to keep the 1:2 ratio) seems to be the only solution to reduce and limit the potential bug.

BozeTadic avatar Jul 07 '22 11:07 BozeTadic

does not mimic the ping message correctly and sends a regular type 1 message with cachedPingMessage string value as content.

Oh what version of the client library are you using? I just did a quick test and it did work with json protocol and 5.0 js client

await connection.start()
await connection.sendMessage(connection.cachedPingMessage);
image

vicancy avatar Jul 08 '22 03:07 vicancy

Oh what version of the client library are you using?

The client library version is "^6.0.4" The change seems to be the renamed cachedPingMessage > _cachedPingMessage.

It works with the underscore. A simple oversight.

await connection.start()
await connection.sendMessage(_connection.cachedPingMessage);

Thank you for helping.

BozeTadic avatar Jul 08 '22 07:07 BozeTadic