socket.io icon indicating copy to clipboard operation
socket.io copied to clipboard

socket.io-client 4.8.0 automatic reconnect is not working

Open sruetzler opened this issue 1 year ago • 17 comments

Describe the bug I have a client which connects to server If the server stops and restarts the connect event is not called anymore

In version 4.7.5 the connect event was called when the server restarts and opens the websocket

Socket.IO client version: 4.8.0

Expected behavior The connect event should be called automatically if the server restarts and opens the websocket

Platform: NodeJs 16 on Ubuntu 20.04 and also in Chromium 128.0.6613.119

Additional context On 4.7.5 I get multiple times this connect_error event until it reconnects.

TransportError: xhr poll error      
    at Polling.onError (/home/sruetzler/data/workspace/gitlab/target/installer/rauc-api-client/node_modules/engine.io-client/build/cjs/transport.js:47:37)                                                         
    at Request.<anonymous> (/home/sruetzler/data/workspace/gitlab/target/installer/rauc-api-client/node_modules/engine.io-client/build/cjs/transports/polling.js:238:18)                                           
    at Request.Emitter.emit (/home/sruetzler/data/workspace/gitlab/target/installer/rauc-api-client/node_modules/@socket.io/component-emitter/lib/cjs/index.js:143:20)
    at Request.onError (/home/sruetzler/data/workspace/gitlab/target/installer/rauc-api-client/node_modules/engine.io-client/build/cjs/transports/polling.js:343:14)
    at Timeout._onTimeout (/home/sruetzler/data/workspace/gitlab/target/installer/rauc-api-client/node_modules/engine.io-client/build/cjs/transports/polling.js:316:30)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7) {
  description: 0,
  context: XMLHttpRequest {
    UNSENT: 0,
    OPENED: 1,
    HEADERS_RECEIVED: 2,
    LOADING: 3,
    DONE: 4,
    readyState: 4,
    onreadystatechange: [Function (anonymous)],
    responseText: 'Error: connect ECONNREFUSED 192.168.70.132:443\n' +
      '    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1161:16)',
    responseXML: '',
    status: 0,
    statusText: Error: connect ECONNREFUSED 192.168.70.132:443
        at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1161:16) {
      errno: -111,
      code: 'ECONNREFUSED',
      syscall: 'connect',
      address: '192.168.70.132',
      port: 443
    },
    open: [Function (anonymous)],
    setDisableHeaderCheck: [Function (anonymous)],
    setRequestHeader: [Function (anonymous)],
    getResponseHeader: [Function (anonymous)],
    getAllResponseHeaders: [Function (anonymous)],
    getRequestHeader: [Function (anonymous)],
    send: [Function (anonymous)],
    handleError: [Function (anonymous)],
    abort: [Function (anonymous)],
    addEventListener: [Function (anonymous)],
    removeEventListener: [Function (anonymous)],
    dispatchEvent: [Function (anonymous)]
  },
  type: 'TransportError'
}

on 4.8.0 I get this errror once and after that I get this endless

No transports available

sruetzler avatar Sep 24 '24 12:09 sruetzler

We're also seeing that it doesn't automatically reconnect when the transport is broken using websockets.

FredrikAugust avatar Sep 30 '24 14:09 FredrikAugust

Is it possible for you to create a minimal repo that simulates the issue? @sruetzler

isarikaya avatar Oct 02 '24 10:10 isarikaya

Hi. Just commenting here so I get further notifications as this potentially affects our product.

In the mean time we locked socket.io related dependencies at ~4.7.5

And due to the new npm audit reports, we added overrides for cookie to version ^0.7.2

Thanks.

jsilvawbc avatar Oct 10 '24 17:10 jsilvawbc

I tried to reproduce this in a short example. But until now could not reproduce it. At this time I don't know what is different in my code that it fails. Perhaps someone else could help. What about @jsilvawbc or @FredrikAugust ? Can you help? Do you know whats different or do you have a simple example code that can reproduce this problem?

sruetzler avatar Oct 11 '24 05:10 sruetzler

@sruetzler

We initialise the io client like this;

		{
			auth: xxx,
			transports: ['websocket', 'polling'],
			withCredentials: true,
			reconnectionDelay: 100,
			reconnectionDelayMax: 1000,
			rememberUpgrade: true,
			closeOnBeforeunload: true
		};

And then observe that if you e.g. kill the backend server, it will simply not attempt to reconnect, even though it should based on the docs.

FredrikAugust avatar Oct 11 '24 08:10 FredrikAugust

@sruetzler

We initialise the io client like this;

		{
			auth: xxx,
			transports: ['websocket', 'polling'],
			withCredentials: true,
			reconnectionDelay: 100,
			reconnectionDelayMax: 1000,
			rememberUpgrade: true,
			closeOnBeforeunload: true
		};

And then observe that if you e.g. kill the backend server, it will simply not attempt to reconnect, even though it should based on the docs.

this.client.on('connect_error', (error) => {
			// If this is active it indicates a transient issue and it will try to reconnect
			if (TypeUtils.isFalse(this.client?.active)) {
				this.logger.warning(`[SocketClient] Connection error occurred: ${error.message}`, {
					error
				});
				this.client.io.connect();
				return;
			}

			this.logger.error(`[SocketClient] Transient connection error occurred: ${error.message}`, {
				error
			});
}

We based the conclusion that the socket should try to automatically reconnect since we had this listener function on the connect_error event. And even though it logged the error as transient, a reconnect attempt was never initialized.

None of the below listener function were ever triggered when the backend was killed and the above mentioned transient connection error log was triggered:

client.io.on('reconnect_attempt', () => {
	this.logger.info('[SocketClient] Initiating attempt to reconnect to socket...');
});

client.io.on('reconnect', (attempt) => {
	this.logger.info(`[SocketClient] Reconnected after ${attempt} attempts`);
});

client.io.on('reconnect_failed', () => {
	this.logger.error('[SocketClient] Reconnection failed after allotted attempts');
});

client.io.on('reconnect_error', (error) => {
	this.logger.error(`[SocketClient] Reconnection error: ${error.message}`, error);
});

AndersRobstad avatar Oct 11 '24 08:10 AndersRobstad

I am also facing this issue. It's pretty major since our "online users" counter just drops to 0 anytime we push updates to our code base. I think going back to 4.7.5 is the move for now

wes337 avatar Oct 14 '24 13:10 wes337

Hi everyone, sorry for the delay.

I was not able to reproduce the issue:

  • without transports
  • with transports: ["polling", "websocket"]
  • with transports: ["websocket"]

"No transports available" suggests the transports array is empty, but I don't know how this could happen. This might be linked to this change.

darrachequesne avatar Oct 22 '24 06:10 darrachequesne

I would need some additional information.

Does it happen with the client bundle? Or with a bundler (webpack, rollup, ...)? In that case, could you please provide your configuration?

Does it happen randomly? Always?

Does it happen in all browsers?

Thanks in advance.

darrachequesne avatar Oct 24 '24 08:10 darrachequesne

Hey @darrachequesne

  • I am using NextJS (14.2.5)
  • It happens always
  • It happens in all browsers

Here's how our frontend connects

export const socket = io(SOCKET_SERVER_URL, {
  autoConnect: false,
  transports: ["websocket"], // <--- Could it related be this?
  timeout: 10000,
  auth: (callback) => {
    const token = getToken()
    if (token ) {
      callback({
        token,
      });
    } else {
      console.log("No token found when trying to connect to Socket");
    }
  },
  parser: customParser, // <--- We're using socket.io-msgpack-parser here
});

wes337 avatar Oct 30 '24 06:10 wes337

I have the same problem. Does someone solved it?

joweste avatar Apr 08 '25 19:04 joweste

Not that I know of @joweste. We ended up writing our own logic...

FredrikAugust avatar Apr 09 '25 14:04 FredrikAugust

I have the same problem. Does someone solved it?

No, still can't upgrade past 4.7.5

wes337 avatar Apr 09 '25 19:04 wes337

Dang Im also facing this issue in 4.8

sims11tz avatar May 16 '25 17:05 sims11tz

Are there any updates to this, roadmap or the likes? Or perhaps any alternatives worth considering? This isn't a great sign in terms of future reliability.

FredrikAugust avatar May 19 '25 07:05 FredrikAugust

We are not using auth() and doing auth manually in allowRequest(), have no issue since then.

sep2 avatar May 22 '25 14:05 sep2

@FredrikAugust as I said earlier, I wasn't able to reproduce at all... It looks like an environment-specific issue, otherwise I think that we would have a lot more user reports.

darrachequesne avatar May 22 '25 15:05 darrachequesne

I'm facing the same issue. What is the workaround? Should I move back to 4.7.5 version?

linushahs avatar Dec 17 '25 16:12 linushahs

@linushahs hi! How can we reproduce the issue? Which platform/bundler are you using?

darrachequesne avatar Dec 17 '25 17:12 darrachequesne

Maybe I can provide more of my setup to help you reproduce it. Here's how our server is setup.

It's an Express.js server, and we're using a couple other things like helmet, express-rate-limit, socket.io-msgpack-parser

export var io: Server;

function setupServer() {
  const app = express();
  app.use(helmet());
  app.use(
    cors({
      origin: ALLOWED_ORIGINS, // An array of strings
    }),
  );

  const limiter = rateLimit({
    windowMs: 15 * 60 * 1000,
    max: 100,
    keyGenerator: (request) => requestIp.getClientIp(request),
  });

  app.use(limiter);

  const server = createServer(app);

  const config: Partial<ServerOptions> = {
    cors: {
      origin: ALLOWED_ORIGINS,
      credentials: true,
    },
    allowRequest,
    connectionStateRecovery: {
      maxDisconnectionDuration: 2 * 60 * 1000,
      skipMiddlewares: true,
    },
    parser: customParser, // socket.io-msgpack-parser
    cleanupEmptyChildNamespaces: true,
    adapter: createAdapter(Redis.ioPublisher, Redis.ioSubscriber),
    cookie: COOKIE_OPTIONS,
  };

  io = new Server(server, config);

  io.engine.use(helmet());
}

const COOKIE_OPTIONS = {
  secure: true,
  sameSite: "lax",
  path: "/",
  domain: IS_STAGING ? "" : COOKIE_ROOT_DOMAIN,
  maxAge: 1000 * 60 * 60 * 24 * 365 * 10,
};

async function allowRequest(
  request: IncomingMessage,
  callback: (error: string | null, success: boolean) => void,
) {
  try {
    const origin = request.headers.origin;
    const allowedOrigin = ALLOWED_ORIGINS.includes(origin);
    callback(null, allowedOrigin);
  } catch (error) {
    console.log(error?.message);
    callback(null, false);
  }
}

export default setupServer;

Not sure but could it be related to any of this config? Maybe the allowRequest function and the connectionStateRecovery config?

wes337 avatar Dec 18 '25 07:12 wes337

@FredrikAugust as I said earlier, I wasn't able to reproduce at all... It looks like an environment-specific issue, otherwise I think that we would have a lot more user reports.

A lot of people are having this issue. But they don't understand what's happening. There are several tickets open about this:

#5299 #5327 #5330

They are all starting after upgrading from version 4.7.5

@darrachequesne

wes337 avatar Dec 18 '25 07:12 wes337