socket.io
socket.io copied to clipboard
Heartbeat detection is delayed when timers throttled, and messages can be sent over expired connection
Describe the bug
Due to chrome (and maybe other browsers) throttling timers when the device sleeps, the heartbeat detection logic doesn't always run in time, causing messages to be lost before reconnection happens late.
To Reproduce
Socket.IO server version: 4.7.5
Server
import { Server } from "socket.io";
const io = new Server(3000, { pingTimeout: 5_000, pingInterval: 25_000});
io.on("connection", (socket) => {
console.log(`connect ${socket.id}`);
socket.on("disconnect", () => {
console.log(`disconnect ${socket.id}`);
});
});
Socket.IO client version: 4.7.5
Client
import { io } from "socket.io-client";
// use a tunnel like the ports feature in vscode to make the connection work over the internet and use that host here
const socket = io("ws://<replace with host>:3000/", {});
socket.on("connect", () => {
console.log(`connect ${socket.id}`, new Date());
});
socket.on("disconnect", () => {
console.log("disconnect", new Date());
});
Expected behavior
- Load the page and wait for the connected message
- Close the laptop with it unplugged so that it goes to sleep
- Wait 60 seconds or so to make sure it's gone to sleep for a bit
- Open the laptop and go back to the page
- The latest connection should not have received a ping in the last 30 seconds, and therefore it should be closed. But because chrome froze the timer that's going to do the disconnection it will take a while longer before that actually happens
- If a message is sent before the reconnection it will probably go into the void and be lost
Proposed fixes
Check if connection is expired before sending every message
I opened a PR for this here: https://github.com/socketio/socket.io/pull/5134
Let me know what you think
This doesn't cause the reconnect to happen immediately when the page is woken up, but it does mean attempting to send a message should trigger the reconnection, so outgoing messages shouldn't get lost.
Use setInterval and periodically check if a ping has not been received instead of setTimeout
I think this and storing the Date.now() of the last ping should be more accurate, and cause the reconnection to happen almost immediately when the page is woken up. Can also do the same on a visibilitychange event when the page goes back to visible.
Can look into a PR for this if you think it makes sense?
Platform:
- Device: Mac
- Browser: Chrome
- OS: Mac OS 14.5 (23F79)
Additional context
https://github.com/socketio/socket.io/issues/3507#issuecomment-977681297 is another fix in the past related to throttled timers
Hi! That's an interesting idea, thanks for the detailed analysis :+1:
However, I'm wondering whether handling such a corner case is really worth it. I don't think we will be able to handle all cases where the client fails to send a packet. If one really want a packet to be received by the server, aren't retries sufficient?
Reference: https://socket.io/docs/v4/client-options/#retries
Hey @darrachequesne
The problem with retries is that it requires that you wait for the ack, and also with retrying you need to make sure you either are ok processing duplicate messages or are including some kind of identifier to make sure you can handle duplicate messages on the server, which is not trivial. I.e. the connection might not have already been dead, it might have died after the message was sent but before the response had chance to come back.
I think the goal of the heartbeat logic is to detect when a socket is broken before a real message goes down it? So this does feel like a problem with that to me. Adding retries feels like it's working around a bug in that system.
I understand without retries you're getting "at most once delivery" and can expect messages to get lost, but if we improve the heartbeat logic this should be a lot more reliable. Before we implemented a workaround in our app (which I'm hoping we can remove if we can get a solution in socket.io at some point), we had reports that people would open there laptop and return to our app, then try and perform an action, which would hang indefinitely.
I think we shouldn't need to require users to move from "at most once" to "at least once" delivery to solve this problem.
This is the workaround we have for now:
// these need to match the config on the server
const pingInterval = 25000;
const pingTimeout = 20000;
function detectSocketDisconnection(socket) {
let lastPingTime = null
const onPing = () => {
lastPingTime = Date.now()
}
const checkForDisconnection = () => {
if (lastPingTime === null) return
const connectionExpired =
Date.now() - lastPingTime > pingInterval + pingTimeout
if (connectionExpired) {
console.log('Forcing disconnection...')
lastPingTime = null
socket.io.engine.close()
}
}
const originalEmit = socket.emit
socket.emit = (...args) => {
// This may cause a disconnect, which will then cause the actual emit message to be
// buffered and sent down the new connection when it's set up
checkForDisconnection()
return originalEmit.apply(socket, args)
}
setInterval(() => checkForDisconnection(), 5000)
window.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'visible') {
checkForDisconnection()
}
})
socket.io.on('ping', onPing)
socket.on('connect', onPing)
}
const socket = io()
detectSocketDisconnection(socket)
@tjenkinson OK, that totally makes sense :+1:
I've added a minor suggestion on the PR: https://github.com/socketio/socket.io/pull/5134
@tjenkinson again, sorry for the delay, I'd really like to get this sorted out.
Do you think it would be sufficient to use the visibilitychange event and send a noop packet when the page becomes visible?
Hey @darrachequesne sometimes when reopening the laptop I'm not seeing the visibility change fire so not sure how reliable it would be
Also when sending a noop packet it can still take some time before it realises the connection is gone and reconnects, and during that time the app might try and send other packets that get lost
And what about the setInterval solution that you suggested above? Do you think it could be reliable enough?
Something like this:
function customSetTimeout(callback, delay) {
let called = false, start = Date.now();
const timerId = setInterval(() => {
if (!called && Date.now() - start > delay) {
called = true;
callback();
}
}, Math.floor(delay / 100));
return {
reset() {
called = false;
start = Date.now();
},
clear() {
clearInterval(timerId);
}
}
}
Hmm it could never be as reliable or as efficient as checking just before sending the message. Set interval is also paused so the interval would need to be quite low and there's still a chance that the app might try and do something before the set interval fires
This should be fixed by https://github.com/socketio/socket.io/commit/8adcfbfde50679095ec2abe376650cf2b6814325, included in [email protected] and [email protected].
@tjenkinson thank you for your tenacity :angel: