StackExchange.Redis icon indicating copy to clipboard operation
StackExchange.Redis copied to clipboard

Cluster - Connecting to 0.0.0.0:6379 for some reason

Open ghost opened this issue 1 year ago • 6 comments

Hello there,

I am connecting to a cluster with 1 master and 2 replicas. The code is pretty simple:

var configuration = new ConfigurationOptions();
configuration.EndPoints.Add("10.254.61.60", 6379);
configuration.EndPoints.Add("10.254.61.61", 6379);
configuration.EndPoints.Add("10.254.61.62", 6379);
configuration.Ssl = true;
configuration.User = "admin";
configuration.Password = "password";
configuration.CertificateValidation += (object sender, X509Certificate? certificate, X509Chain? chain, SslPolicyErrors sslPolicyErrors) => { return true; };
// 5 seconds
ConnectionMultiplexer redis = ConnectionMultiplexer.Connect(configuration, Console.Out);

But the connection takes up to 5 seconds, because for some reason the client is trying to connect to a 0.0.0.0:6379 endpoint and it takes it 4.7 seconds to fail. I have no idea why it would try to do that. There is no error in the logs nor any explanation I could find. logs.txt

P.S. the cert verification is required because we are using certs signed by our internal CA which my local PC doesn't trust. The app running in production doesn't have that line but does have the same issue.

ghost avatar Oct 24 '23 12:10 ghost

Is it possible for you to connect to one of the servers with redis-cli and issue the command cluster nodes ? I'm very curious what the server is responding there

mgravell avatar Oct 24 '23 16:10 mgravell

Master - 127.0.0.1:6379> cluster nodes 7af72812710708717f3dee924254ac77a8f0cbc3 0.0.0.0:6379@16379 myself,master - 0 0 0 connected 0-16383

Replica - 127.0.0.1:6379> cluster nodes 7af72812710708717f3dee924254ac77a8f0cbc3 10.254.61.60:6379@16379 master - 0 1698221169052 0 connected 0-16383 127c5d1020c09098e010e19d96294b1dad1fff30 0.0.0.0:6379@16379 myself,slave 7af72812710708717f3dee924254ac77a8f0cbc3 0 0 0 connected

Replica - 127.0.0.1:6379> cluster nodes 7af72812710708717f3dee924254ac77a8f0cbc3 10.254.61.60:6379@16379 master - 0 1698221165064 0 connected 0-16383 f97c9da47d911ac99f56b11170a8d325e3edf3a3 0.0.0.0:6379@16379 myself,slave 7af72812710708717f3dee924254ac77a8f0cbc3 0 0 0 connected

Well I am buffled, Master doesn't know about replicas And replicas know about only the Master. I am not sure what went wrong.

ghost avatar Oct 25 '23 08:10 ghost

Hey,

I'm colleague of @Qualatea trying to dig out more information as he's currently occupied.

Could this be related to cluster-announce-ip 0.0.0.0 option being set on both master and replicas? I was trying to dig some more information from the config and this part is included in Docker/NAT support section.

We are running the master and replicas on VMs with static IP and ports being directly exposed to the clients, so could commenting out cluster-announce options solve this? Or simply setting it to the actual static IP of the node.

Thanks.

David

dergyitheron avatar Oct 25 '23 08:10 dergyitheron

cluster-announce-ip is the ip that the nodes announce. It's there so that when redis or a client does exactly what StackExchange.Redis is doing (asking the cluster about its topography) - it can point them to the correct IP address. So yes that's definitely why you're seeing behavior. You most likely should be able to comment out that line (make sure the other cluster-announce-* fields aren't configured oddly either naturally.

The classic case where it would be necessary is if you were running your cluster inside of a docker network and wanted to do some manual NAT for your Redis instance.

slorello89 avatar Oct 25 '23 11:10 slorello89

I propose that we do add a minor tweak to explicitly exclude wildcard addresses (with an entry in the log), but: the library can't work with this misconfiguration, so the "fix" here is to not tell the servers to advertise that address

mgravell avatar Oct 25 '23 12:10 mgravell

@mgravell I like the thinking, but I don't think we should do that, simply because it might work even if it seems very wrong. If the response is local and the destination is too, that configuration may be working today, at least on Linux environments. It's not quite the same, but 0.0.0.0 ~= 127.0.0.1 in functionality and given containers...I can easily see that guard breaking a working scenario. Thoughts?

NickCraver avatar Jan 11 '24 02:01 NickCraver