StackExchange.Redis icon indicating copy to clipboard operation
StackExchange.Redis copied to clipboard

Connection established but 'The specified endpoint is not defined'

Open Timmoth opened this issue 1 year ago • 5 comments

I'm running a three node redis:7.2-alpine cluster on kubernetes, 1 master, 2 replicas, 3 sentinels. My config is here

In dotnet I am using this code to connect:

     var sentinelConfig = new ConfigurationOptions
        {
            AbortOnConnectFail = false,
            AllowAdmin = true,
            ConnectTimeout = 5000,
            ConnectRetry = 10,
            ServiceName = "mymaster",
            Proxy = Proxy.None,
            Ssl = false,
            KeepAlive = 10,
            ResolveDns = true,
            SyncTimeout = 5000,
            TieBreaker = "",
            Password = redisSettings.Password
        };

        foreach (var sentinel in redisSettings.Sentinels)
        {
            sentinelConfig.EndPoints.Add(sentinel.Host, sentinel.Port);
        }

        var redis = ConnectionMultiplexer.Connect(sentinelConfig, Console.Out);
        services.AddSingleton<IConnectionMultiplexer>(redis);

Which works fine when running a redis cluster in docker compose, it has also worked on/off in the k8 cluster. When it doesn't work the endpoint summary looks correct. As far as i can tell from the logs it's connected to the sentinels and resolved the correct ip / port for each redis endpoint, the exception thrown is the only thing I can tell that seems out of place:

06:42:16.9712: All 3 available tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=50,Max=1000), WORKER: (Busy=1,Free=32766,Min=50,Max=32767), POOL: (Threads=11,QueuedItems=0,CompletedItems=131,Timers=2 │
│ 06:42:16.9714: Endpoint summary:                                                                                                                                                                            │
│ 06:42:16.9716:   10.244.1.68:6379: Endpoint is (Interactive: ConnectedEstablished, Subscription: ConnectedEstablished)                                                                                      │
│ 06:42:16.9717:   10.244.0.85:6379: Endpoint is (Interactive: ConnectedEstablished, Subscription: ConnectedEstablished)                                                                                      │
│ 06:42:16.9718:   10.244.0.174:6379: Endpoint is (Interactive: ConnectedEstablished, Subscription: ConnectedEstablished)                                                                                     │
│ 06:42:16.9719: Task summary:                                                                                                                                                                                │
│ 06:42:16.9720:   10.244.1.68:6379: Returned with success as Standalone primary (Source: Connection race)                                                                                                    │
│ 06:42:16.9723:   10.244.0.85:6379: Returned with success as Standalone replica (Source: Already connected)                                                                                                  │
│ 06:42:16.9724:   10.244.0.174:6379: Returned with success as Standalone replica (Source: Already connected)                                                                                                 │
│ 06:42:16.9725: Election summary:                                                                                                                                                                            │
│ 06:42:16.9727:   Election: Single primary detected: 10.244.1.68:6379                                                                                                                                        │
│ 06:42:16.9728: 10.244.1.68:6379: Clearing as RedundantPrimary                                                                                                                                               │
│ 06:42:16.9729: Endpoint Summary:                                                                                                                                                                            │
│ 06:42:16.9731:   10.244.1.68:6379: Standalone v7.2.5, primary; 16 databases; keep-alive: 00:00:10; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active                                           │
│ 06:42:16.9732:   10.244.1.68:6379: int ops=13, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=7, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1                                                                          │
│ 06:42:16.9733:   10.244.1.68:6379: Circular op-count snapshot; int: 0+13=13 (1.30 ops/s; spans 10s); sub: 0+7=7 (0.70 ops/s; spans 10s)                                                                     │
│ 06:42:16.9735:   10.244.0.85:6379: Standalone v7.2.5, replica; 16 databases; keep-alive: 00:00:10; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active                                           │
│ 06:42:16.9736:   10.244.0.85:6379: int ops=14, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=7, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1                                                                          │
│ 06:42:16.9738:   10.244.0.85:6379: Circular op-count snapshot; int: 0+14=14 (1.40 ops/s; spans 10s); sub: 0+7=7 (0.70 ops/s; spans 10s)                                                                     │
│ 06:42:16.9739:   10.244.0.174:6379: Standalone v7.2.5, replica; 16 databases; keep-alive: 00:00:10; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active
 06:42:16.9741:   10.244.0.174:6379: int ops=14, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=7, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1                                                                         │
│ 06:42:16.9742:   10.244.0.174:6379: Circular op-count snapshot; int: 0+14=14 (1.40 ops/s; spans 10s); sub: 0+7=7 (0.70 ops/s; spans 10s)                                                                    │
│ 06:42:16.9744: Sync timeouts: 0; async timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
│ 06:42:16.9745: Starting heartbeat...                                                                                                                                                                        │
│ 06:42:16.9747: Total connect time: 35 ms                                                                                                                                                                    │
│ Unhandled exception. System.ArgumentException: The specified endpoint is not defined (Parameter 'endpoint')                                                                                                 │
│    at StackExchange.Redis.ConnectionMultiplexer.GetServer(EndPoint endpoint, Object asyncState) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 1247                                            │
│    at StackExchange.Redis.ConnectionMultiplexer.GetSentinelMasterConnection(ConfigurationOptions config, TextWriter log) in /_/src/StackExchange.Redis/ConnectionMultiplexer.Sentinel.cs:line 237           │
│    at StackExchange.Redis.ConnectionMultiplexer.SentinelPrimaryConnect(ConfigurationOptions configuration, TextWriter log) in /_/src/StackExchange.Redis/ConnectionMultiplexer.Sentinel.cs:line 134         │
│    at StackExchange.Redis.ConnectionMultiplexer.Connect(ConfigurationOptions configuration, TextWriter log) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 685

This suggests something might be wrong with my config? But the fact that it has worked on the cluster, and consistently works locally has me confused.

Does anyone have any ideas or would be able to provide me with some direction to trouble shoot?

Timmoth avatar May 24 '24 07:05 Timmoth

This looks like Sentinel is not returning a valid endpoint (or one we recognize) when asked what the master is.

If you connect up directly and query sentinel master mymaster, what do you get back?

NickCraver avatar Jun 08 '24 12:06 NickCraver

@NickCraver We have identified something similar with this, see https://github.com/samcook/RedLock.net/issues/112#issuecomment-2187152737

It exists cases when sentinel returns IP addresses that isn't longer included in the cluster, the connection multiplexer will work correctly and abort them during initialization, but the IConnectionMultiplexer.GetEndPoints() includes them and when executing the IConnectionMultiplexer.GetServer(endPoint) for an endpoint that not received and answer the ArgumentException is thrown.

Is the expectation that IConnectionMultiplexer.GetEndPoints() should return all entries that sentinel knows about?

Tasteful avatar Jun 26 '24 12:06 Tasteful

I am hitting this issue as well.

@Tasteful please correct my assumptions if they are wrong. This issue is likely to be encountered by anyone calling IConnectionMultiplexer.GetServer(...) while using Redis Sentinel running in Kubernetes. The only current workaround is to add code to catch ArgumentException and skip the endpoint, with the assumption that this indicates it is no longer an active node?

This seems like a pretty serious issue. I have a lot of code that uses SE.Redis and hesitations with adding this exception handling uniformly. ArgumentException is meant to be avoided, not caught. Aside from the code smell, I lose the ability to distinguish this situation from others that would indicate a bug in consuming code - such as attempting to pass an endpoint that never was a valid node.

kmcclellan avatar Jul 29 '24 16:07 kmcclellan

@Tasteful please correct my assumptions if they are wrong. This issue is likely to be encountered by anyone calling IConnectionMultiplexer.GetServer(...) while using Redis Sentinel running in Kubernetes. The only current workaround is to add code to catch ArgumentException and skip the endpoint, with the assumption that this indicates it is no longer an active node?

Yes, that is correct.

Tasteful avatar Jul 29 '24 18:07 Tasteful

@NickCraver Did you have time to check the above question about GetEndpoints/GetServer?

@mgravell maybe you have insigth into this area as well.

Tasteful avatar Dec 30 '24 08:12 Tasteful