socket.io icon indicating copy to clipboard operation
socket.io copied to clipboard

Heartbeat detection is delayed when timers throttled, and messages can be sent over expired connection

Open tjenkinson opened this issue 1 year ago • 7 comments

Describe the bug

Due to chrome (and maybe other browsers) throttling timers when the device sleeps, the heartbeat detection logic doesn't always run in time, causing messages to be lost before reconnection happens late.

To Reproduce

Socket.IO server version: 4.7.5

Server

import { Server } from "socket.io";

const io = new Server(3000, { pingTimeout: 5_000, pingInterval: 25_000});

io.on("connection", (socket) => {
  console.log(`connect ${socket.id}`);

  socket.on("disconnect", () => {
    console.log(`disconnect ${socket.id}`);
  });
});

Socket.IO client version: 4.7.5

Client

import { io } from "socket.io-client";

// use a tunnel like the ports feature in vscode to make the connection work over the internet and use that host here
const socket = io("ws://<replace with host>:3000/", {});

socket.on("connect", () => {
  console.log(`connect ${socket.id}`, new Date());
});

socket.on("disconnect", () => {
  console.log("disconnect", new Date());
});

Expected behavior

  • Load the page and wait for the connected message
  • Close the laptop with it unplugged so that it goes to sleep
  • Wait 60 seconds or so to make sure it's gone to sleep for a bit
  • Open the laptop and go back to the page
  • The latest connection should not have received a ping in the last 30 seconds, and therefore it should be closed. But because chrome froze the timer that's going to do the disconnection it will take a while longer before that actually happens
  • If a message is sent before the reconnection it will probably go into the void and be lost

Proposed fixes

Check if connection is expired before sending every message

I opened a PR for this here: https://github.com/socketio/socket.io/pull/5134

Let me know what you think

This doesn't cause the reconnect to happen immediately when the page is woken up, but it does mean attempting to send a message should trigger the reconnection, so outgoing messages shouldn't get lost.

Use setInterval and periodically check if a ping has not been received instead of setTimeout

I think this and storing the Date.now() of the last ping should be more accurate, and cause the reconnection to happen almost immediately when the page is woken up. Can also do the same on a visibilitychange event when the page goes back to visible.

Can look into a PR for this if you think it makes sense?

Platform:

  • Device: Mac
  • Browser: Chrome
  • OS: Mac OS 14.5 (23F79)

Additional context

https://github.com/socketio/socket.io/issues/3507#issuecomment-977681297 is another fix in the past related to throttled timers

tjenkinson avatar Jul 15 '24 14:07 tjenkinson

Hi! That's an interesting idea, thanks for the detailed analysis :+1:

However, I'm wondering whether handling such a corner case is really worth it. I don't think we will be able to handle all cases where the client fails to send a packet. If one really want a packet to be received by the server, aren't retries sufficient?

Reference: https://socket.io/docs/v4/client-options/#retries

darrachequesne avatar Jul 18 '24 07:07 darrachequesne

Hey @darrachequesne

The problem with retries is that it requires that you wait for the ack, and also with retrying you need to make sure you either are ok processing duplicate messages or are including some kind of identifier to make sure you can handle duplicate messages on the server, which is not trivial. I.e. the connection might not have already been dead, it might have died after the message was sent but before the response had chance to come back.

I think the goal of the heartbeat logic is to detect when a socket is broken before a real message goes down it? So this does feel like a problem with that to me. Adding retries feels like it's working around a bug in that system.

I understand without retries you're getting "at most once delivery" and can expect messages to get lost, but if we improve the heartbeat logic this should be a lot more reliable. Before we implemented a workaround in our app (which I'm hoping we can remove if we can get a solution in socket.io at some point), we had reports that people would open there laptop and return to our app, then try and perform an action, which would hang indefinitely.

I think we shouldn't need to require users to move from "at most once" to "at least once" delivery to solve this problem.

This is the workaround we have for now:

// these need to match the config on the server
const pingInterval = 25000;
const pingTimeout = 20000;

function detectSocketDisconnection(socket) {
  let lastPingTime = null
  const onPing = () => {
    lastPingTime = Date.now()
  }

  const checkForDisconnection = () => {
    if (lastPingTime === null) return

    const connectionExpired =
      Date.now() - lastPingTime > pingInterval + pingTimeout

    if (connectionExpired) {
      console.log('Forcing disconnection...')
      lastPingTime = null
      socket.io.engine.close()
    }
  }

  const originalEmit = socket.emit
  socket.emit = (...args) => {
    // This may cause a disconnect, which will then cause the actual emit message to be
    // buffered and sent down the new connection when it's set up
    checkForDisconnection()
    return originalEmit.apply(socket, args)
  }

  setInterval(() => checkForDisconnection(), 5000)

  window.addEventListener('visibilitychange', () => {
    if (document.visibilityState === 'visible') {
      checkForDisconnection()
    }
  })

  socket.io.on('ping', onPing)
  socket.on('connect', onPing)
}

const socket = io()
detectSocketDisconnection(socket)

tjenkinson avatar Jul 19 '24 11:07 tjenkinson

@tjenkinson OK, that totally makes sense :+1:

I've added a minor suggestion on the PR: https://github.com/socketio/socket.io/pull/5134

darrachequesne avatar Jul 22 '24 08:07 darrachequesne

@tjenkinson again, sorry for the delay, I'd really like to get this sorted out.

Do you think it would be sufficient to use the visibilitychange event and send a noop packet when the page becomes visible?

darrachequesne avatar Sep 16 '24 06:09 darrachequesne

Hey @darrachequesne sometimes when reopening the laptop I'm not seeing the visibility change fire so not sure how reliable it would be

Also when sending a noop packet it can still take some time before it realises the connection is gone and reconnects, and during that time the app might try and send other packets that get lost

tjenkinson avatar Sep 16 '24 13:09 tjenkinson

And what about the setInterval solution that you suggested above? Do you think it could be reliable enough?

Something like this:

function customSetTimeout(callback, delay) {
  let called = false, start = Date.now();

  const timerId = setInterval(() => {
    if (!called && Date.now() - start > delay) {
      called = true;
      callback();
    }
  }, Math.floor(delay / 100));

  return {
    reset() {
      called = false;
      start = Date.now();
    },
    clear() {
      clearInterval(timerId);
    }
  }
}

darrachequesne avatar Sep 17 '24 07:09 darrachequesne

Hmm it could never be as reliable or as efficient as checking just before sending the message. Set interval is also paused so the interval would need to be quite low and there's still a chance that the app might try and do something before the set interval fires

tjenkinson avatar Sep 17 '24 12:09 tjenkinson

This should be fixed by https://github.com/socketio/socket.io/commit/8adcfbfde50679095ec2abe376650cf2b6814325, included in [email protected] and [email protected].

@tjenkinson thank you for your tenacity :angel:

darrachequesne avatar Oct 22 '24 06:10 darrachequesne