StackExchange.Redis
StackExchange.Redis copied to clipboard
`ConnectionMultiplexer.IsConnected` not detecting lost connections
I have a situation where IsConnected continues to return true for ~20 minutes after the connection is gone. Commands are still being issued by the web service, which time out. I would expect the connection state to be updated to reflect the lost connection after timeouts start appearing.
Unfortunately this only happens in deployed environments. Locally, StackExchange.Redis detects the lost connection immediately, but when deployed to various environments it takes ~20 minutes as mentioned above.
Is this on Linux? We've seen this with huge socket timeouts configured at the OS level before is the reason I ask.
Yes, the deployed environment is a linux-based docker image run in kubernetes.
I ran a local test using the same Dockerfile that is used to generate instances of the service for remote deployments and ran it in my local docker. It properly detected the connection interruption immediately (less than a second I'd say).
Going to work with our devops team to look at the rest of the deployed network configuration. Could be a timeout somewhere along the path we have configured that is super high.
Details on the ~15min connection stalls on Linux: https://github.com/StackExchange/StackExchange.Redis/issues/1848#issuecomment-913064646
Kubernetes environments can also see connection problems due to various reasons including: noisy neighbor pods, node maintenance, or Envoy's sidecar pods intercepting network traffic. If all else fails, a packet capture might provide some insights.
Best info we have is above - closing out here to cleanup.
Update: a new version 2.7.10 has been released, including #2610 to detect and recover stalled sockets. This should help prevent the situation where connections can stall for ~15 minutes on Linux clients.