Connection established but 'The specified endpoint is not defined'
I'm running a three node redis:7.2-alpine cluster on kubernetes, 1 master, 2 replicas, 3 sentinels. My config is here
In dotnet I am using this code to connect:
var sentinelConfig = new ConfigurationOptions
{
AbortOnConnectFail = false,
AllowAdmin = true,
ConnectTimeout = 5000,
ConnectRetry = 10,
ServiceName = "mymaster",
Proxy = Proxy.None,
Ssl = false,
KeepAlive = 10,
ResolveDns = true,
SyncTimeout = 5000,
TieBreaker = "",
Password = redisSettings.Password
};
foreach (var sentinel in redisSettings.Sentinels)
{
sentinelConfig.EndPoints.Add(sentinel.Host, sentinel.Port);
}
var redis = ConnectionMultiplexer.Connect(sentinelConfig, Console.Out);
services.AddSingleton<IConnectionMultiplexer>(redis);
Which works fine when running a redis cluster in docker compose, it has also worked on/off in the k8 cluster. When it doesn't work the endpoint summary looks correct. As far as i can tell from the logs it's connected to the sentinels and resolved the correct ip / port for each redis endpoint, the exception thrown is the only thing I can tell that seems out of place:
06:42:16.9712: All 3 available tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=50,Max=1000), WORKER: (Busy=1,Free=32766,Min=50,Max=32767), POOL: (Threads=11,QueuedItems=0,CompletedItems=131,Timers=2 │
│ 06:42:16.9714: Endpoint summary: │
│ 06:42:16.9716: 10.244.1.68:6379: Endpoint is (Interactive: ConnectedEstablished, Subscription: ConnectedEstablished) │
│ 06:42:16.9717: 10.244.0.85:6379: Endpoint is (Interactive: ConnectedEstablished, Subscription: ConnectedEstablished) │
│ 06:42:16.9718: 10.244.0.174:6379: Endpoint is (Interactive: ConnectedEstablished, Subscription: ConnectedEstablished) │
│ 06:42:16.9719: Task summary: │
│ 06:42:16.9720: 10.244.1.68:6379: Returned with success as Standalone primary (Source: Connection race) │
│ 06:42:16.9723: 10.244.0.85:6379: Returned with success as Standalone replica (Source: Already connected) │
│ 06:42:16.9724: 10.244.0.174:6379: Returned with success as Standalone replica (Source: Already connected) │
│ 06:42:16.9725: Election summary: │
│ 06:42:16.9727: Election: Single primary detected: 10.244.1.68:6379 │
│ 06:42:16.9728: 10.244.1.68:6379: Clearing as RedundantPrimary │
│ 06:42:16.9729: Endpoint Summary: │
│ 06:42:16.9731: 10.244.1.68:6379: Standalone v7.2.5, primary; 16 databases; keep-alive: 00:00:10; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active │
│ 06:42:16.9732: 10.244.1.68:6379: int ops=13, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=7, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1 │
│ 06:42:16.9733: 10.244.1.68:6379: Circular op-count snapshot; int: 0+13=13 (1.30 ops/s; spans 10s); sub: 0+7=7 (0.70 ops/s; spans 10s) │
│ 06:42:16.9735: 10.244.0.85:6379: Standalone v7.2.5, replica; 16 databases; keep-alive: 00:00:10; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active │
│ 06:42:16.9736: 10.244.0.85:6379: int ops=14, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=7, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1 │
│ 06:42:16.9738: 10.244.0.85:6379: Circular op-count snapshot; int: 0+14=14 (1.40 ops/s; spans 10s); sub: 0+7=7 (0.70 ops/s; spans 10s) │
│ 06:42:16.9739: 10.244.0.174:6379: Standalone v7.2.5, replica; 16 databases; keep-alive: 00:00:10; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active
06:42:16.9741: 10.244.0.174:6379: int ops=14, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=7, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1 │
│ 06:42:16.9742: 10.244.0.174:6379: Circular op-count snapshot; int: 0+14=14 (1.40 ops/s; spans 10s); sub: 0+7=7 (0.70 ops/s; spans 10s) │
│ 06:42:16.9744: Sync timeouts: 0; async timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
│ 06:42:16.9745: Starting heartbeat... │
│ 06:42:16.9747: Total connect time: 35 ms │
│ Unhandled exception. System.ArgumentException: The specified endpoint is not defined (Parameter 'endpoint') │
│ at StackExchange.Redis.ConnectionMultiplexer.GetServer(EndPoint endpoint, Object asyncState) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 1247 │
│ at StackExchange.Redis.ConnectionMultiplexer.GetSentinelMasterConnection(ConfigurationOptions config, TextWriter log) in /_/src/StackExchange.Redis/ConnectionMultiplexer.Sentinel.cs:line 237 │
│ at StackExchange.Redis.ConnectionMultiplexer.SentinelPrimaryConnect(ConfigurationOptions configuration, TextWriter log) in /_/src/StackExchange.Redis/ConnectionMultiplexer.Sentinel.cs:line 134 │
│ at StackExchange.Redis.ConnectionMultiplexer.Connect(ConfigurationOptions configuration, TextWriter log) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 685
This suggests something might be wrong with my config? But the fact that it has worked on the cluster, and consistently works locally has me confused.
Does anyone have any ideas or would be able to provide me with some direction to trouble shoot?
This looks like Sentinel is not returning a valid endpoint (or one we recognize) when asked what the master is.
If you connect up directly and query sentinel master mymaster, what do you get back?
@NickCraver We have identified something similar with this, see https://github.com/samcook/RedLock.net/issues/112#issuecomment-2187152737
It exists cases when sentinel returns IP addresses that isn't longer included in the cluster, the connection multiplexer will work correctly and abort them during initialization, but the IConnectionMultiplexer.GetEndPoints() includes them and when executing the IConnectionMultiplexer.GetServer(endPoint) for an endpoint that not received and answer the ArgumentException is thrown.
Is the expectation that IConnectionMultiplexer.GetEndPoints() should return all entries that sentinel knows about?
I am hitting this issue as well.
@Tasteful please correct my assumptions if they are wrong. This issue is likely to be encountered by anyone calling IConnectionMultiplexer.GetServer(...) while using Redis Sentinel running in Kubernetes. The only current workaround is to add code to catch ArgumentException and skip the endpoint, with the assumption that this indicates it is no longer an active node?
This seems like a pretty serious issue. I have a lot of code that uses SE.Redis and hesitations with adding this exception handling uniformly. ArgumentException is meant to be avoided, not caught. Aside from the code smell, I lose the ability to distinguish this situation from others that would indicate a bug in consuming code - such as attempting to pass an endpoint that never was a valid node.
@Tasteful please correct my assumptions if they are wrong. This issue is likely to be encountered by anyone calling
IConnectionMultiplexer.GetServer(...)while using Redis Sentinel running in Kubernetes. The only current workaround is to add code to catchArgumentExceptionand skip the endpoint, with the assumption that this indicates it is no longer an active node?
Yes, that is correct.
@NickCraver Did you have time to check the above question about GetEndpoints/GetServer?
@mgravell maybe you have insigth into this area as well.