amazon-chime-sdk-js Callback to listen for SignallingChannel ready state

To summarize

We rely on AudioVideoFacade.realtimeSendDataMessage(...) function to send messages to other attendees in the call. But some times sending data, results in RealtimeApiFailed and error: Signaling client is not ready which terminates the session for that attendee.

What are you trying to do?

Looking for a callback that would notify when the SignalingChannel changes state from connected to disconnected and vice versa. Also receiving the reason for the change in state would be very helpful.

Have checked all the callbacks exposed by Chime SDK, did not find any promising way to determine if SignalingClient is ready or not. This new call back will help to determine when not to send message and prevent abrupt session termination.

Some other observations/conditions around when we see this failure

We get warning logs that indicate that client will be reconnecting. a. will retry due to status code ConnectionHealthReconnect → triggered at DefaultAudioVideoController.ts b. Followed by AudioVideoObserver.audioVideoDidStartConnecting(true /* reconnecting */ ) getting called.
The warning log → missed pong 1 time(s) is fired.
AudioVideoObserver.connectionDidSuggestStopVideo()` is called.
Few instances of failure without any of the above.

We can handle (1) by not invoking realtimeSendDataMessage because we know for sure that the connection is being re-established. But the other cases we don't know for sure. Also it is too many places to keep track of.

May 12 '21 11:05 hesawant

@hesawant Thanks for reporting this. I have reproduced the issue. I don't think we should throw an error there and should log it as an error instead. However, the problem remains is how we alert consumers that the message is not sent. I will bring this back and discuss with the team.

May 12 '21 19:05 ltrung

@ltrung I wanted to follow up on the status of this. It has become quite a headache for us, because when the signaling client state changes to not ready, we're usually sending a very high volume of data messages. Which in turn means a big burden on our internal error reporting pipeline, as each failed message is being logged as an error. We can hack around it for the time being, but it would be very helpful to have a public method we can use to check the state of the signaling client or a callback that informs us of the state change.

Oct 12 '21 19:10 brycepj

@brycepj Thanks. I will bring this back and discuss with our team. By the way, could you give us some details how you workaround the current issue?

Oct 12 '21 19:10 ltrung

Some follow up questions:

When the signaling client state changes to not ready, did reconnection happen? Did you receive audioVideoDidStartConnecting for reconnecting and audioVideoDidStart when the reconnection finished? If so, can we use that to resend the data messages.
Could you give more details on your current workaround? One workaround i think of is to just check audioVideo.audioVideoController.meetingSessionContext.signalingClient.ready() before sending data message.

Oct 12 '21 20:10 ltrung

@ltrung thanks so much for the prompt and helpful reply 🙏

When the signaling client state changes to not ready, did reconnection happen? Did you receive audioVideoDidStartConnecting for reconnecting and audioVideoDidStart when the reconnection finished? If so, can we use that to resend the data messages.

Yes, I can see both callbacks being fired and handled properly on our end. But in many of the cases we're looking at, the signaling channel disconnecting follows a long string of throttled messages being returned. This happens during user actions that trigger data messages to be sent very frequently (on mousemove). So even though the reconnect happens, I can see hundreds -- if not thousands -- of throttled messages and signaling client readiness errors before that.

Could you give more details on your current workaround? One workaround i think of is to just check audioVideo.audioVideoController.meetingSessionContext.signalingClient.ready() before sending data message.

The current workaround is just debouncing signaling client error logs that match the message in question. Very brittle, but a stopgap. Thanks for the tip about accessing the signalingClient. I looked for a way to access it for a while, but didn't make any headway. Your way is much better though, so we'll give that a go.

Oct 12 '21 22:10 brycepj

Yes, I can see both callbacks being fired and handled properly on our end. But in many of the cases we're looking at, the signaling channel disconnecting follows a long string of throttled messages being returned.

Should we stop sending messages after we see the first throttled message? Consecutive throttled message will cause the disconnection.

As mentioned in the API Overview, the current limit is:

If you send too many messages at once, your messages may be returned to you with the throttled flag set. The current throttling soft limit for Data Messages is 100 messages per second with the maximum burst size of 200 for a meeting (i.e. a 'token bucket' of size 200 that refills at 100 tokens per second). If you continue to exceed the throttle limit, then the server may hang up the connection. The hard limit for each attendee is 200 messages per second with the maximum burst of 2000 and for a meeting is 500 messages per second with the maximum burst of 10000.

Oct 13 '21 16:10 ltrung

I'm curious -- does a single throttled message change the ready state of the signal client? If we stopped sending messages, how we would we know when to try again?

Oct 13 '21 18:10 brycepj

No a single throttled message does not disconnect the signaling client. It just means you reached the soft limit on our backend as detailed above (100 messages per second with the maximum burst size of 200 for a meeting). The client disconnected only once you reached the hard limit. I would think if you could control your rate to stay below that soft limit which should prevent this issue in the first place. If you think the soft limit rate is too low please cut an AWS Support ticket to us :)

Oct 13 '21 18:10 ltrung

amazon-chime-sdk-js amazon-chime-sdk-js copied to clipboard

Callback to listen for SignallingChannel ready state

To summarize

What are you trying to do?

Some other observations/conditions around when we see this failure

amazon-chime-sdk-js
amazon-chime-sdk-js copied to clipboard