fred.rs icon indicating copy to clipboard operation
fred.rs copied to clipboard

Connection management issues

Open birkmose opened this issue 1 year ago • 2 comments

Hi - Thank you for a great library!

We have been experiencing some connection management issues (including using latest version 5.2.0). Using fred as a redis client in a kubernetes cluster (redis is self-hosted in k8s) - if the redis instance is moved to another node, we end up in a corrupted state with connection management.

This can simply be reproduced doing kubectl scale deploy/redis --replicas 0.

We have tested with/without pipelining. With/without connection pool. Config is using exponential reconnect policy + command timeout. This is a trace of the events.

Initially we get a connection - everything is fine:

 DEBUG fred::multiplexer::commands > fred-JUKIBXKdhV: Initializing connections...
 TRACE fred::protocol::types       > fred-JUKIBXKdhV: Using 10.0.78.183 among 1 possible socket addresses for redis:6379
 TRACE fred::multiplexer::utils    > fred-JUKIBXKdhV: Connecting to 10.0.78.183:6379
 TRACE mio::poll                   > registering event source with poller: token=Token(0), interests=READABLE | WRITABLE
 DEBUG fred::protocol::connection  > fred-JUKIBXKdhV: Skip setting client name.
 TRACE redis_protocol::utils       > allocating more, len: 0, amt: 24
 TRACE fred::protocol::codec       > fred-JUKIBXKdhV: Encoded 24 bytes to 10.0.78.183:6379. Buffer len: 24 (RESP2)
 TRACE tokio_util::codec::framed_impl > flushing framed transport
 TRACE tokio_util::codec::framed_impl > writing; remaining=24
 TRACE tokio_util::codec::framed_impl > framed transport flushed
 TRACE tokio_util::codec::framed_impl > attempting to decode a frame
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Recv 4 bytes from 10.0.78.183:6379 (RESP2).
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Parsed 4 bytes from 10.0.78.183:6379
 TRACE tokio_util::codec::framed_impl > frame decoded from buffer
 DEBUG fred::protocol::connection     > fred-JUKIBXKdhV: Read client ID: Resp2(Integer(7))
 DEBUG fred::multiplexer::utils       > fred-JUKIBXKdhV: Creating new close tx sender.
 DEBUG fred::multiplexer::utils       > fred-JUKIBXKdhV: Set centralized connection closed sender.
 DEBUG fred::multiplexer::utils       > fred-JUKIBXKdhV: Emitting connect message.

Now we scale down the deployment:

kubectl scale deploy/redis --replicas 0

Fred realizes that the connection is lost:

 DEBUG fred::multiplexer::commands    > fred-JUKIBXKdhV: Starting command stream...
 DEBUG fred::multiplexer::commands    > fred-JUKIBXKdhV: Set inner connection closed sender.
 TRACE tokio_util::codec::framed_impl > attempting to decode a frame
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Recv 0 bytes from 10.0.78.183:6379 (RESP2).
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Recv 0 bytes from 10.0.78.183:6379 (RESP2).
 DEBUG fred::multiplexer::utils       > fred-JUKIBXKdhV: Redis frame stream closed with error Redis Error - kind: Canceled, details: Canceled.
 DEBUG fred::multiplexer::utils       > fred-JUKIBXKdhV: Emit connection closed from error: Redis Error - kind: Canceled, details: Canceled.
 TRACE fred::multiplexer::utils       > fred-JUKIBXKdhV: Emitting connection closed with 0 messages
 DEBUG fred::multiplexer::commands    > fred-JUKIBXKdhV: Recv reconnect message with 0 commands. State: Disconnected
 INFO  fred::multiplexer::commands    > fred-JUKIBXKdhV: Sleeping for 132 ms before reconnecting
 TRACE fred::protocol::types          > fred-JUKIBXKdhV: Using 10.0.78.183 among 1 possible socket addresses for redis:6379
 TRACE fred::multiplexer::utils       > fred-JUKIBXKdhV: Connecting to 10.0.78.183:6379
 TRACE mio::poll                      > registering event source with poller: token=Token(1), interests=READABLE | WRITABLE

Notice at this time there are no pods running redis, so the reconnect will just hang. Now we attempt to execute a command against redis, using a client from the connection pool:

 TRACE fred::multiplexer::commands    > fred-JUKIBXKdhV: Recv command on multiplexer SET. Buffer len: 0
 DEBUG fred::multiplexer::commands    > fred-JUKIBXKdhV: Will block multiplexer loop waiting on SET to finish.
 TRACE fred::multiplexer              > fred-JUKIBXKdhV: Skip waiting on cluster sync.
 DEBUG fred::multiplexer::utils       > fred-JUKIBXKdhV: Writing command SET to redis:6379
 TRACE fred::protocol::connection     > fred-JUKIBXKdhV: Sending command and flushing the sink.
 TRACE redis_protocol::utils          > allocating more, len: 0, amt: 40
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Encoded 40 bytes to 10.0.78.183:6379. Buffer len: 40 (RESP2)
 TRACE tokio_util::codec::framed_impl > flushing framed transport
 TRACE tokio_util::codec::framed_impl > writing; remaining=40
 WARN  fred::multiplexer::commands    > fred-JUKIBXKdhV: Error writing command None: Redis Error - kind: IO, details: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
 DEBUG fred::multiplexer::commands    > fred-JUKIBXKdhV: Reconnecting or stopping due to error: Redis Error - kind: IO, details: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
 DEBUG fred::multiplexer::utils       > fred-JUKIBXKdhV: Emitting close all sockets message: Redis Error - kind: IO, details: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
 WARN  fred::multiplexer::utils       > fred-JUKIBXKdhV: Error sending close message to socket streams: SendError(Redis Error - kind: IO, details: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" })
 DEBUG fred::multiplexer::commands    > fred-JUKIBXKdhV: Waiting for client to reconnect...
resp Err(Redis Error - kind: Timeout, details: Request timed out.)

Next we scale up redis:

kubectl scale deploy/redis --replicas 1

After a while the connection pool will reconnect:

 DEBUG fred::protocol::connection     > fred-JUKIBXKdhV: Skip setting client name.
 TRACE redis_protocol::utils          > allocating more, len: 0, amt: 24
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Encoded 24 bytes to 10.0.78.183:6379. Buffer len: 24 (RESP2)
 TRACE tokio_util::codec::framed_impl > flushing framed transport
 TRACE tokio_util::codec::framed_impl > writing; remaining=24
 TRACE tokio_util::codec::framed_impl > framed transport flushed
 TRACE tokio_util::codec::framed_impl > attempting to decode a frame
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Recv 4 bytes from 10.0.78.183:6379 (RESP2).
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Parsed 4 bytes from 10.0.78.183:6379
 TRACE tokio_util::codec::framed_impl > frame decoded from buffer
 DEBUG fred::protocol::connection     > fred-JUKIBXKdhV: Read client ID: Resp2(Integer(3))
 DEBUG fred::multiplexer::utils       > fred-JUKIBXKdhV: Set centralized connection closed sender.
 TRACE mio::poll                      > deregistering event source from poller
 DEBUG fred::multiplexer::commands    > fred-JUKIBXKdhV: Reconnect task finished reconnecting or syncing with: Ok(())
 DEBUG fred::multiplexer::commands    > fred-JUKIBXKdhV: Sending 0 commands after reconnecting.
 DEBUG fred::multiplexer::utils       > fred-JUKIBXKdhV: Emitting connect message.
 TRACE tokio_util::codec::framed_impl > attempting to decode a frame
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Recv 0 bytes from 10.0.78.183:6379 (RESP2).

And now we have the problem - if we try to execute a command using the client from the connection pool:

 TRACE fred::multiplexer::commands    > fred-JUKIBXKdhV: Recv command on multiplexer SET. Buffer len: 0
 DEBUG fred::multiplexer::commands    > fred-JUKIBXKdhV: Will block multiplexer loop waiting on SET to finish.
 TRACE fred::multiplexer              > fred-JUKIBXKdhV: Skip waiting on cluster sync.
 DEBUG fred::multiplexer::utils       > fred-JUKIBXKdhV: Writing command SET to redis:6379
 TRACE fred::protocol::connection     > fred-JUKIBXKdhV: Sending command and flushing the sink.
 TRACE redis_protocol::utils          > allocating more, len: 0, amt: 40
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Encoded 40 bytes to 10.0.78.183:6379. Buffer len: 40 (RESP2)
 TRACE tokio_util::codec::framed_impl > flushing framed transport
 TRACE tokio_util::codec::framed_impl > writing; remaining=40
 TRACE tokio_util::codec::framed_impl > framed transport flushed
 DEBUG fred::multiplexer::commands    > fred-JUKIBXKdhV: Waiting on last request to finish without pipelining.
 TRACE tokio_util::codec::framed_impl > attempting to decode a frame
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Recv 5 bytes from 10.0.78.183:6379 (RESP2).
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Parsed 5 bytes from 10.0.78.183:6379
 TRACE tokio_util::codec::framed_impl > frame decoded from buffer
 TRACE fred::multiplexer::responses   > fred-JUKIBXKdhV: Processing response from redis:6379 to SET with frame kind SimpleString
 TRACE fred::multiplexer::responses   > fred-JUKIBXKdhV: Writing to multiplexer sender to unblock command loop.
 WARN  fred::multiplexer::responses   > fred-JUKIBXKdhV: Error sending cmd loop response: ()
 TRACE fred::multiplexer::responses   > fred-JUKIBXKdhV: Responding to caller for SET
 WARN  fred::multiplexer::responses   > fred-JUKIBXKdhV: Failed to respond to caller.
 TRACE tokio_util::codec::framed_impl > attempting to decode a frame
 TRACE fred::protocol::codec          > fred-JUKIBXKdhV: Recv 0 bytes from 10.0.78.183:6379 (RESP2).
resp Err(Redis Error - kind: Timeout, details: Request timed out.)

The command is actually executed (we can verify this using redis-cli and checking that the key has been SET in the redis cluster.). All following requests using this client will now result in timeouts (although the command is in fact executed!) .

We use tokio v1.20.1 for reference.

If we use a headless service in k8s, we do not experience this problem (as the DNS will resolve with 0 ip-addresses when the deployment is scaled to 0 replicas, and resolve the new ip-address once the pod has been moved to the other node).

birkmose avatar Sep 01 '22 15:09 birkmose

Hi @birkmose, thanks for the comprehensive bug report.

This seems related to https://github.com/aembke/fred.rs/issues/59, at least in terms of the underlying issue with the implementation. Unfortunately the fix to either is highly invasive, but that's what I've been working on for the last several weeks. Version 6 will fix both and I'm hoping to have that out in the next week or two.

Just as a heads up - I may reach out in the next week or two for some feedback on a CI repro for this. So far I've tried to avoid adding minikube to the mix in CI, but that seems unavoidable at this point.

aembke avatar Sep 01 '22 16:09 aembke

I may reach out in the next week or two for some feedback on a CI repro for this

Happy to help out!

birkmose avatar Sep 02 '22 07:09 birkmose

We are also running this library on a redis cluster running on k8s and ran into the same issue.

bodymindarts avatar Sep 23 '22 18:09 bodymindarts

Hi @aembke Is there any news/progress regarding the issue?

DDtKey avatar Oct 18 '22 11:10 DDtKey