node-slack-sdk
node-slack-sdk copied to clipboard
socket-mode 1.x: explicit server disconnect during 'connecting' state intermittently not handled
@filmaj I seem to be hitting the following now on 1.3.5:
[INFO] socket-mode:SocketModeClient:0 Going to establish a new connection to Slack ...
/home/patrick/slack/node_modules/finity/lib/core/StateMachine.js:76
throw new Error('Unhandled event \'' + event + '\' in state \'' + this.currentState + '\'.');
^
Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
at StateMachine.handleUnhandledEvent (/home/patrick/slack/node_modules/finity/lib/core/StateMachine.js:76:13)
at /home/patrick/slack/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
at TaskScheduler.execute (/home/patrick/slack/node_modules/finity/lib/core/TaskScheduler.js:29:7)
at TaskScheduler.enqueue (/home/patrick/slack/node_modules/finity/lib/core/TaskScheduler.js:19:12)
at HierarchicalStateMachine.handle (/home/patrick/slack/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
at SocketModeClient.onWebSocketMessage (/home/patrick/slack/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
at WebSocket.onMessage (/home/patrick/slack/node_modules/ws/lib/event-target.js:132:16)
at WebSocket.emit (node:events:518:28)
at Receiver.receiverOnMessage (/home/patrick/slack/node_modules/ws/lib/websocket.js:1068:20)
at Receiver.emit (node:events:518:28)
With debug logging it shows:
[DEBUG] socket-mode:SocketModeClient:0 Received a message on the WebSocket: {"type":"disconnect","reason":"too_many_websockets","debug_info":{"host":"applink-2"}}
[DEBUG] socket-mode:SocketModeClient:0 Received "disconnect" (reason: too_many_websockets) message - will attempt reconnect
Originally posted by @MadrMan in https://github.com/slackapi/node-slack-sdk/issues/1654#issuecomment-2094100237
Any PR to address this should target the socket-mode-1.x
branch as this would be a 'maintenance' fix for the now-outdated 1.x line of socket-mode.
This is a tricky issue to handle, because the 1.x line of socket-mode's architecture is such that there is a single state machine to handle transitions, but potentially, during a reconnect, two WebSocket connections are briefly active/overlapping as a connection is re-established. The underlying problem is, during this reconnect when we have two WS connections open, each WebSocket connection pumps events into the single state machine, causing unexpected state transitions.
Consider the case where, at more or less the same time, one WS connection sends a "disconnect" event to the state machine and another sends a "open" or "connected" event. The single state machine will get confused.
This issue is resolved in socket-mode 2.x, as the architecture has completely changed. Socket-mode 2.0 is available in bolt 4.0.0 rc1, and has been used successfully by some customers who have adopted it early.
We recently stumbled upon this issue on multiple projects and here are the debug logs for the typical occurrence.
2024-09-03 02:39:12.801 Transitioning to state: connected:ready
2024-09-03 02:39:12.801 Started running a new heart beat job
2024-09-03 02:39:12.801 Terminated the old connection
2024-09-03 02:39:12.801 Switched to the secondary connection
2024-09-03 02:39:12.801 Switching to the secondary connection ...
2024-09-03 02:39:12.801 Transitioning to state: connected:preparing
2024-09-03 02:39:12.801 Now connected to Slack
2024-09-03 02:39:12.801 Transitioning to state: connected
2024-09-03 02:39:12.800 Received a message on the WebSocket: {\"type\":\"hello\",\"num_connections\":10,\"debug_info\":{\"host\":\"applink-9\",\"build_number\":97,\"approximate_connection_time\":18060},\"connection_info\":{\"app_id\":\"A03UYB9U6U9\}
2024-09-03 02:39:12.788 Transitioning to state: connecting:handshaking
2024-09-03 02:39:12.788 Secondary WebSocket open event received (connection established)
2024-09-03 02:39:12.693 Transitioning to state: connecting:authenticated
2024-09-03 02:39:12.693 apiCall('apps.connections.open') end
2024-09-03 02:39:12.693 http request result: {\"ok\":true,\"url\":\"wss://wss-primary.slack.com/link/?ticket=TICKET&app_id=APP_ID\",\"response_metadata\":{\"scopes\":[\"connections:write\"],\"acceptedScopes\":[\"connections:write\"]}}
2024-09-03 02:39:12.692 http response received
2024-09-03 02:39:12.596 http request headers: {}
2024-09-03 02:39:12.596 http request body: {}
2024-09-03 02:39:12.596 http request url: https://slack.com/api/apps.connections.open
2024-09-03 02:39:12.596 apiCall('apps.connections.open') start
2024-09-03 02:39:12.596 Going to retrieve a new WSS URL ...
2024-09-03 02:39:12.596 Transitioning to state: connecting:authenticating
2024-09-03 02:39:12.596 Going to establish a new connection to Slack ...
2024-09-03 02:39:12.596 Transitioning to state: connecting
2024-09-03 02:39:12.596 Reconnecting to Slack ...
2024-09-03 02:39:12.596 Transitioning to state: reconnecting
2024-09-03 02:39:12.593 Secondary WebSocket close event received (code: 1006, reason: )
2024-09-03 02:39:02.594 Received \"disconnect\" (reason: too_many_websockets) message - will attempt reconnect
2024-09-03 02:39:02.590 at Receiver.emit (node:events:519:28)
2024-09-03 02:39:02.590 at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1070:20)
2024-09-03 02:39:02.590 at WebSocket.emit (node:events:519:28)
2024-09-03 02:39:02.590 at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:132:16)
2024-09-03 02:39:02.590 at SocketModeClient.onWebSocketMessage (/app/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
2024-09-03 02:39:02.590 at HierarchicalStateMachine.handle (/app/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
2024-09-03 02:39:02.590 at TaskScheduler.enqueue (/app/node_modules/finity/lib/core/TaskScheduler.js:19:12)
2024-09-03 02:39:02.590 at TaskScheduler.execute (/app/node_modules/finity/lib/core/TaskScheduler.js:29:7)
2024-09-03 02:39:02.590 at /app/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
2024-09-03 02:39:02.590 at StateMachine.handleUnhandledEvent (/app/node_modules/finity/lib/core/StateMachine.js:76:13)
2024-09-03 02:39:02.590 Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
2024-09-03 02:39:02.590 This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason:
2024-09-03 02:39:02.587 Received a message on the WebSocket: {\"type\":\"disconnect\",\"reason\":\"too_many_websockets\",\"debug_info\":{\"host\":\"applink-14\}
2024-09-03 02:39:02.579 Transitioning to state: connecting:handshaking
2024-09-03 02:39:02.578 Secondary WebSocket open event received (connection established)
2024-09-03 02:39:02.449 Transitioning to state: connecting:authenticated
2024-09-03 02:39:02.449 apiCall('apps.connections.open') end
2024-09-03 02:39:02.449 http request result: {\"ok\":true,\"url\":\"wss://wss-primary.slack.com/link/?ticket=TICKET&app_id=APP_ID\",\"response_metadata\":{\"scopes\":[\"connections:write\"],\"acceptedScopes\":[\"connections:write\"]}}
2024-09-03 02:39:02.448 http response received
2024-09-03 02:39:02.320 http request headers: {}
2024-09-03 02:39:02.320 http request body: {}
2024-09-03 02:39:02.320 http request url: https://slack.com/api/apps.connections.open
2024-09-03 02:39:02.320 apiCall('apps.connections.open') start
2024-09-03 02:39:02.320 Going to retrieve a new WSS URL ...
2024-09-03 02:39:02.320 Transitioning to state: connecting:authenticating
2024-09-03 02:39:02.320 Going to establish a new connection to Slack ...
2024-09-03 02:39:02.320 Transitioning to state: connecting
2024-09-03 02:39:02.320 Reconnecting to Slack ...
2024-09-03 02:39:02.320 Transitioning to state: reconnecting
2024-09-03 02:39:02.318 Secondary WebSocket close event received (code: 1006, reason: )
2024-09-03 02:38:52.319 Received \"disconnect\" (reason: too_many_websockets) message - will attempt reconnect
2024-09-03 02:38:52.316 at Receiver.emit (node:events:519:28)
2024-09-03 02:38:52.316 at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1070:20)
2024-09-03 02:38:52.316 at WebSocket.emit (node:events:519:28)
2024-09-03 02:38:52.316 at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:132:16)
2024-09-03 02:38:52.316 at SocketModeClient.onWebSocketMessage (/app/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
2024-09-03 02:38:52.316 at HierarchicalStateMachine.handle (/app/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
2024-09-03 02:38:52.316 at TaskScheduler.enqueue (/app/node_modules/finity/lib/core/TaskScheduler.js:19:12)
2024-09-03 02:38:52.316 at TaskScheduler.execute (/app/node_modules/finity/lib/core/TaskScheduler.js:29:7)
2024-09-03 02:38:52.316 at /app/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
2024-09-03 02:38:52.316 at StateMachine.handleUnhandledEvent (/app/node_modules/finity/lib/core/StateMachine.js:76:13)
2024-09-03 02:38:52.316 Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
2024-09-03 02:38:52.316 This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason:
2024-09-03 02:38:52.315 Received a message on the WebSocket: {\"type\":\"disconnect\",\"reason\":\"too_many_websockets\",\"debug_info\":{\"host\":\"applink-4\}
2024-09-03 02:38:52.305 Transitioning to state: connecting:handshaking
2024-09-03 02:38:52.305 Secondary WebSocket open event received (connection established)
2024-09-03 02:38:52.198 Transitioning to state: connecting:authenticated
2024-09-03 02:38:52.198 apiCall('apps.connections.open') end
2024-09-03 02:38:52.198 http request result: {\"ok\":true,\"url\":\"wss://wss-primary.slack.com/link/?ticket=TICKET&app_id=APP_ID\",\"response_metadata\":{\"scopes\":[\"connections:write\"],\"acceptedScopes\":[\"connections:write\"]}}
2024-09-03 02:38:52.195 http response received
2024-09-03 02:38:52.097 http request headers: {}
2024-09-03 02:38:52.097 http request body: {}
2024-09-03 02:38:52.097 http request url: https://slack.com/api/apps.connections.open
2024-09-03 02:38:52.097 apiCall('apps.connections.open') start
2024-09-03 02:38:52.097 Going to retrieve a new WSS URL ...
2024-09-03 02:38:52.097 Transitioning to state: connecting:authenticating
2024-09-03 02:38:52.097 Going to establish a new connection to Slack ...
2024-09-03 02:38:52.097 Transitioning to state: connecting
2024-09-03 02:38:52.097 Reconnecting to Slack ...
2024-09-03 02:38:52.097 Transitioning to state: reconnecting
2024-09-03 02:38:52.094 Secondary WebSocket close event received (code: 1006, reason: )
2024-09-03 02:38:42.093 Received \"disconnect\" (reason: too_many_websockets) message - will attempt reconnect
2024-09-03 02:38:42.089 at Receiver.emit (node:events:519:28)
2024-09-03 02:38:42.089 at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1070:20)
2024-09-03 02:38:42.089 at WebSocket.emit (node:events:519:28)
2024-09-03 02:38:42.089 at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:132:16)
2024-09-03 02:38:42.089 at SocketModeClient.onWebSocketMessage (/app/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
2024-09-03 02:38:42.089 at HierarchicalStateMachine.handle (/app/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
2024-09-03 02:38:42.089 at TaskScheduler.enqueue (/app/node_modules/finity/lib/core/TaskScheduler.js:19:12)
2024-09-03 02:38:42.089 at TaskScheduler.execute (/app/node_modules/finity/lib/core/TaskScheduler.js:29:7)
2024-09-03 02:38:42.089 at /app/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
2024-09-03 02:38:42.089 at StateMachine.handleUnhandledEvent (/app/node_modules/finity/lib/core/StateMachine.js:76:13)
2024-09-03 02:38:42.089 Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
2024-09-03 02:38:42.089 This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason:
2024-09-03 02:38:42.087 Received a message on the WebSocket: {\"type\":\"disconnect\",\"reason\":\"too_many_websockets\",\"debug_info\":{\"host\":\"applink-12\}
2024-09-03 02:38:42.080 Transitioning to state: connecting:handshaking
2024-09-03 02:38:42.078 Secondary WebSocket open event received (connection established)
2024-09-03 02:38:41.976 Transitioning to state: connecting:authenticated
2024-09-03 02:38:41.976 apiCall('apps.connections.open') end
2024-09-03 02:38:41.976 http request result: {\"ok\":true,\"url\":\"wss://wss-primary.slack.com/link/?ticket=TICKET&app_id=APP_ID\",\"response_metadata\":{\"scopes\":[\"connections:write\"],\"acceptedScopes\":[\"connections:write\"]}}
2024-09-03 02:38:41.974 http response received
2024-09-03 02:38:41.877 http request headers: {}
2024-09-03 02:38:41.877 http request body: {}
2024-09-03 02:38:41.877 http request url: https://slack.com/api/apps.connections.open
2024-09-03 02:38:41.877 apiCall('apps.connections.open') start
2024-09-03 02:38:41.877 Going to retrieve a new WSS URL ...
2024-09-03 02:38:41.877 Transitioning to state: connecting:authenticating
2024-09-03 02:38:41.877 Going to establish a new connection to Slack ...
2024-09-03 02:38:41.877 Transitioning to state: connecting
2024-09-03 02:38:41.877 Reconnecting to Slack ...
2024-09-03 02:38:41.877 Transitioning to state: reconnecting
2024-09-03 02:38:41.874 Secondary WebSocket close event received (code: 1006, reason: )
2024-09-03 02:38:31.878 Received \"disconnect\" (reason: too_many_websockets) message - will attempt reconnect
2024-09-03 02:38:31.872 at Receiver.emit (node:events:519:28)
2024-09-03 02:38:31.872 at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1070:20)
2024-09-03 02:38:31.872 at WebSocket.emit (node:events:519:28)
2024-09-03 02:38:31.872 at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:132:16)
2024-09-03 02:38:31.872 at SocketModeClient.onWebSocketMessage (/app/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
2024-09-03 02:38:31.872 at HierarchicalStateMachine.handle (/app/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
2024-09-03 02:38:31.872 at TaskScheduler.enqueue (/app/node_modules/finity/lib/core/TaskScheduler.js:19:12)
2024-09-03 02:38:31.872 at TaskScheduler.execute (/app/node_modules/finity/lib/core/TaskScheduler.js:29:7)
2024-09-03 02:38:31.872 at /app/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
2024-09-03 02:38:31.872 at StateMachine.handleUnhandledEvent (/app/node_modules/finity/lib/core/StateMachine.js:76:13)
2024-09-03 02:38:31.872 Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
2024-09-03 02:38:31.872 This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason:
2024-09-03 02:38:31.870 Received a message on the WebSocket: {\"type\":\"disconnect\",\"reason\":\"too_many_websockets\",\"debug_info\":{\"host\":\"applink-1\}
2024-09-03 02:38:31.862 Transitioning to state: connecting:handshaking
2024-09-03 02:38:31.862 Secondary WebSocket open event received (connection established)
2024-09-03 02:38:31.743 Transitioning to state: connecting:authenticated
2024-09-03 02:38:31.743 apiCall('apps.connections.open') end
2024-09-03 02:38:31.743 http request result: {\"ok\":true,\"url\":\"wss://wss-primary.slack.com/link/?ticket=TICKET&app_id=APP_ID\",\"response_metadata\":{\"scopes\":[\"connections:write\"],\"acceptedScopes\":[\"connections:write\"]}}
2024-09-03 02:38:31.740 http response received
2024-09-03 02:38:31.610 http request headers: {}
2024-09-03 02:38:31.610 http request body: {}
2024-09-03 02:38:31.609 http request url: https://slack.com/api/apps.connections.open
2024-09-03 02:38:31.609 apiCall('apps.connections.open') start
2024-09-03 02:38:31.609 Going to retrieve a new WSS URL ...
2024-09-03 02:38:31.609 Transitioning to state: connecting:authenticating
2024-09-03 02:38:31.609 Going to establish a new connection to Slack ...
2024-09-03 02:38:31.609 Transitioning to state: connecting
2024-09-03 02:38:31.609 Reconnecting to Slack ...
2024-09-03 02:38:31.609 Transitioning to state: reconnecting
2024-09-03 02:38:31.609 Terminated the heart beat job
2024-09-03 02:38:31.609 Cancelled the job waiting for ping from Slack
2024-09-03 02:38:31.605 A pong wasn't received from the server before the timeout of 5000ms!
The first log is a big clue:
2024-09-03 02:38:31.605 A pong wasn't received from the server before the timeout of 5000ms!
This is not a good sign, and may actually point to an issue in the Slack backend; under the hood, socket-mode sends ping
messages to the Slack backend as a kind of heartbeat, to ensure the connection is working properly. The fact that Slack did not respond with the per-specification pong
message within 5 seconds is unexpected and unusual.
Hi @filmaj do you have a plan to release the stable 4.x version? I want to determine whether we upgrade to rc1 or wait the stable version.
@weijiany do you mean a bolt-js v4 release? We have started to put together plans for it but there is no specific timeline. If this issue is affecting you, I recommend trying the v4 rc.