socket.io-redis-adapter icon indicating copy to clipboard operation
socket.io-redis-adapter copied to clipboard

Error: timeout reached while waiting for fetchSockets response

Open donfire opened this issue 1 year ago • 5 comments

We are using two version of socket io. For
v2.5.0 => "socket.io-redis": "^5.4.0" v4.7.1 => @socket.io/redis-adapter": "^8.0.1"

transport is working fine when we are pushing the messages but whenever we are running const sockets = await io.to('bridge-room').fetchSockets(); on v4.7.1 we get the following error

/node_modules/@socket.io/redis-adapter/dist/index.js:712
                    reject(new Error("timeout reached while waiting for fetchSockets response"));
                           ^

Error: timeout reached while waiting for fetchSockets response

I have been hitting my head for a while now on this.

donfire avatar Jul 07 '23 10:07 donfire

Hi! If I understand correctly, you have a cluster of servers, some in v2.5.0 and some in v4.7.1.

I think the problem is that the fetchSockets() operation was added in v4, so the v4 server sends the operation to all the servers but only other v4 servers respond, so the operation times out (even though the server has received all necessary responses).

Not sure how to handle this though. Maybe some kind of feature detection?

darrachequesne avatar Aug 11 '23 11:08 darrachequesne

I am seeing this as well

cody-evaluate avatar Oct 28 '23 02:10 cody-evaluate

me too

introspection3 avatar Nov 05 '23 11:11 introspection3

Any updates ???, still encountering the same issue

HannaSamia avatar Jan 18 '24 10:01 HannaSamia

I was gonna make an issue but I'm glad this is here. This comment is just a investigation "report" of the code and why it behaves this way (AFAIK) First of all I'm not a pro at redis.

My best guess is there are multiple things to consider and investigate. Some steps to consider:

  1. Use the timeout() function to test and see if the cause of the issue is really just the request taking too long io.timeout(20000).fetchSockets()

  2. If (1) didn't work, maybe have a check somewhere to keep checking the heartbeat of all your instances?

  3. Check your redis connection in each instances?

These lines below handle the response and expects (waits for) K number of responses. So if you have 3 instances running, it will expect 2 response (3 - self) and will not clear the timeout until all responses are received. https://github.com/socketio/socket.io-redis-adapter/blob/cdb55353f83c78cabe9788683e4dd93ac4cd50c9/lib/index.ts#L553C1-L559C10

This is the code that generates the timeout error (for reference):

https://github.com/socketio/socket.io-redis-adapter/blob/cdb55353f83c78cabe9788683e4dd93ac4cd50c9/lib/index.ts#L755C1-L763C32

Note that this will unconditionally throw an error;

My point (3) is arguably the least possible, but never says never. The reason I'm saying this is because the number of responses expected K is generated via https://github.com/socketio/socket.io-redis-adapter/blob/cdb55353f83c78cabe9788683e4dd93ac4cd50c9/lib/index.ts#L736 which uses redis.

TLDR:

  • We get number K of remote instances
  • Send a request to all instances + start a timeout
  • As we receive responses from all instances, we check if number of responses (H) received = K
  • if H = K --> clearTimeout; Else do nothing

Take away: For the timeout not too clear, at least a single remote instance has to fail responding.

My proposed solution to maintainers (cc @darrachequesne) : If at least one instance sent a valid response, instead of a throwing an error "timeout reached while waiting for fetchSockets response" we should still resolve the promise and log a warning "fetchSockets : Only H responses received within timeout, expected K"

Benjythebee avatar May 29 '24 20:05 Benjythebee