azure-signalr icon indicating copy to clipboard operation
azure-signalr copied to clipboard

SignalR serverless mode Ping timeout

Open aosyatnik opened this issue 3 years ago • 5 comments

Describe the bug

I think we have some similar issues as https://github.com/Azure/azure-signalr/issues/1279 and https://github.com/Azure/azure-signalr/issues/710.

We have signalR Standard tier with 1 unit: image

We are sending quite a big amount of messages, e.g. 85k per day: image

After being connected for something like 1 hour, clients are disconnected with timeout issue. I even configurated disconnected event grid to check: image

I found here: For ping timeout, it might be caused by high CPU usage or thread pool starvation on the server side.. But I have no idea how to check these CPU? We are using serverless mode: image So there is no timeout configuration of anything like this...

App insights (as you can see in orange there are definitely ping issues): image

To Reproduce

Server-side we are sending messages as following... It's an azure function, triggered by iot hub.

[FunctionName("Ingest")]
        public async Task Run(
            [IoTHubTrigger("IoTHubName", Connection = "IoTHubConnectionString"

#if DEBUG
            , ConsumerGroup = "localdev"
#else
            , ConsumerGroup = "release"
#endif

            )] EventData[] messages,
            [SignalR(HubName = "DeviceMessagesHub")] IAsyncCollector<SignalRMessage> signalRMessages)
        {
                  // saves the message to db, some other business stuff...

                  foreach(var telemetry in messages)
                      await signalRMessages.AddAsync(new SignalRMessage
                                              {
                                                  GroupName = "group-is",
                                                  Target = "target",
                                                  Arguments = new[] { telemetry }
                                              });
         }

In frontend it's angular app, that uses "@microsoft/signalr": "^5.0.6". Setup like this:

private async getConnectionInfo(userId: string, endpoint: string): Promise<any> {
    return this.http.post<any>(`${environment.endpoint}${endpoint}`, null, {
      headers: {
        'x-ms-client-principal-id': userId
      }
    }).toPromise();
  }

public async init() {
    if (this.messagesHubConnection) {
      return;
    }

    this.me = await this.authService.user.pipe(first()).toPromise();

    if (!this.messagesHubConnection) {
      await this.connectToMessagesHub();
    }
  }

  private async connectToMessagesHub() {
    const info = await this.getConnectionInfo(this.me.userId, 'NegotiateDeviceMessagesHub');

    // make compatible with old and new SignalRConnectionInfo
    info.accessToken = info.accessToken || info.accessKey;
    info.url = info.url || info.endpoint;

    const options = {
      accessTokenFactory: () => info.accessToken
    };

    this.messagesHubConnection = new signalR.HubConnectionBuilder()
      .withUrl(info.url, options)
      .withAutomaticReconnect()
      .configureLogging(environment.debug ? signalR.LogLevel.Information : signalR.LogLevel.Error)
      .build();

    this.messagesHubConnection.onclose(this.onConnectionClosed.bind(this));
    this.messagesHubConnection.onreconnecting(error => this.onReconnecting(error));

    this.messagesHubConnection.start().catch(err => console.error(err.toString()));
  }

Exceptions (if any)

In the browser console, I can see time to time error: Connection disconnected with error 'Error: Server timeout elapsed without receiving a message from the server.' (sorry, didn't take a screenshot, but I assume you know this error...) Maybe it's worth to mention, that it doesn't happen always, sometimes I can leave it running for an hour, sometimes for a night, but in the end, the connection will be dropped, which is not appropriate for our application.

I would be very grateful if you could give some tips on how to resolve this issue?

Further technical details

  • Your Azure SignalR SDK version - Standard tier, version 1.0? idk image

  • Your Server ASPNETCORE version or Assembly version of Microsoft.AspNetCore.SignalR - none, we are using serverless

  • Your SignalR Client SDK version - "@microsoft/signalr": "^5.0.6"

aosyatnik avatar Jun 08 '21 14:06 aosyatnik

Ah, got it again and made a screenshot: image

Looks like it happened now on reconnect call.

aosyatnik avatar Jun 08 '21 15:06 aosyatnik

Could you share a typical client HAR file for like 5 minutes? Not necessarily when the issue takes place, (HAR when the issue takes place would be nice but not necessary), would like to understand the client logic through the network trace. https://docs.microsoft.com/en-us/azure/azure-signalr/signalr-howto-troubleshoot-method#how-to-view-the-traffic-and-narrow-down-the-issue

Also could you send me lianwei(at)microsoft.com your resource id and the timestamp when such issue took place?

vicancy avatar Jun 09 '21 04:06 vicancy

Sent HAR files over the email. Thank you

aosyatnik avatar Jun 09 '21 07:06 aosyatnik

The HAR shows the client fails to send out ping messages on time when it is busy handling the incoming messages.

vicancy avatar Jun 10 '21 02:06 vicancy

Caused by a bug in SignalR JS client https://github.com/dotnet/aspnetcore/issues/33629

vicancy avatar Jun 18 '21 02:06 vicancy