StackExchange.Redis icon indicating copy to clipboard operation
StackExchange.Redis copied to clipboard

Support Redis Enterprise Active-Active failover out-of-the-box

Open mrmartan opened this issue 3 years ago • 3 comments

Hello,

We are migrating our Redis workloads to RE Active-Active databases and it seems we have to prepare our applications for that as per https://docs.redislabs.com/latest/rs/references/developing-for-active-active/region-application-failover-active-active/. The gist is that the application/client is expected to handle failure detection, failover and failback. I was wondering whether SE.Redis supports that or could support that. I was unsuccessful in finding any information or experiences with SE.Redis and Active-Active databases.

Since we are going to have to handle this anyway I am willing to contribute a PR but I will require guidance as how to approach this in SE.Redis. What would this entail?

mrmartan avatar Jun 03 '21 09:06 mrmartan

Missed this initially, sorry! We don't really support this since Redis doesn't really support this (it's a bit of fakery happening). We do allow you to connect to multiple master servers and the library will pick one at election time, but we don't have a way to say "choose the local one".

Inherently by protocol, multiple master is a split brain situation, for which we have a tiebreaker that can be set (it's set if this library makes the selection) and prefer the tiebreaker-specified one if that tiebreaker exists. That being said, I just moved tiebreakers into the handshake code to simplify things here (no functional changes) are part of some unrelated backlog work in play for v2.5+ of this library. My best guess at a path here is to allow specifying how a tiebreaker works in another way, e.g. if we have multiple masters try the tiebreaker key, and barring that, try the one with much lower latency. That's not 100% since any single connection could have a failure or latency spike and end up connected to the other though.

The fallback for this is we pick the first primary in the list, which means if you configure site A to be RedisA,RedisB and site B to be RedisB,RedisA, they should each prefer their local primary when available. I believe this is what you want here today - so revert the connection strings to put your preferred endpoint first in the list.

Alternatives would be to allow you to get a list of endpoints and data and you make the decision about which one to pick. For example: options.TieBreaker being a Func<List<ServerEndpoint>, ServerEndpoint> or something where you can write any logic you choose and we put the tiebreaker and latency as properties on them - if we can't decide immediately (single primary) then we'd fallback to this function and decide. The exiting tiebreaker code could potentially be implemented as the default for this config option. I'm not saying this is trivial or necessarily works (would have to spike it and see), but it's a potential path.

NickCraver avatar Jan 08 '22 13:01 NickCraver

Thanks for the reply.

I am not sure I follow. I don't think we can treat this in the same way as connecting to multiple masters. Specifically since while the A-A participating clusters can become disconnected, it ensures eventual consistency. There is even an option to enable "causal consistency" which I honestly don't really understand.

What we ended up doing is marking the cluster connection strings as local/primary and remote/secondary up front (we use Kubernetes API and init containers) and then a colleague of mine wrote a library managing and switching multiple ConnectionMultiplexers. It became more involved than I initially hoped for. I believe though that implementing the general idea in StackExchange.Redis would be much better.

For example: options.TieBreaker being a Func<List<ServerEndpoint>, ServerEndpoint> or something where you can write any logic you choose

Maybe even even your suggestion is the way, I can't tell right now.

Here is an exempt from our library:

    internal sealed class RedisConnection : IDisposable
    {
        private readonly ILogger _logger;
        private readonly IEnumerable<IConnectionMultiplexerDecorator> _decorators;
        private readonly ConnectionFactory _connectionFactory;

        public RedisConnection(
            ILogger logger,
            ConnectionDescriptor descriptor,
            IEnumerable<IConnectionMultiplexerDecorator> decorators,
            ConnectionFactory? connectionFactory = null)
        {
            Descriptor = descriptor;
            _logger = logger;
            _decorators = decorators;
            _connectionFactory = connectionFactory ?? ConnectWithDefaultFactory;
        }

        public ConnectionDescriptor Descriptor { get; }
        public bool IsHealthy { get; private set; }
        public IConnectionMultiplexer? InternalConnection { get; private set; }

        public async Task<bool> TryConnect(CancellationToken cancellationToken)
        {
            try
            {
                InternalConnection = await _connectionFactory(Descriptor.ConnectionString);

                ConnectEventListeners(InternalConnection);
                return true;
            }
            catch (RedisConnectionException exc)
            {
                _logger.LogError(exc, "Connecting to {connectionName} ({type}) redis failed",
                    Descriptor.Name, Descriptor.Type);
            }

            return false;
        }

        private void ConnectEventListeners(IConnectionMultiplexer connection)
        {
            foreach (var decorator in _decorators)
            {
                decorator.Decorate(connection);
            }
            
            connection.ConnectionRestored += (_, args) => SetIsHealthyWhenInteractiveConnection(args.ConnectionType, isHealthy: true);
            connection.ConnectionFailed += (_, args) => SetIsHealthyWhenInteractiveConnection(args.ConnectionType, isHealthy: false);

            IsHealthy = connection.IsConnected;
            
            void SetIsHealthyWhenInteractiveConnection(StackExchange.Redis.ConnectionType connectionType, bool isHealthy)
            {
                if (connectionType == StackExchange.Redis.ConnectionType.Interactive)
                {
                    IsHealthy = isHealthy;
                }
            }
        }

        /// <summary>
        /// Connect to redis instance with default configuration optimal to Active-Active solution
        /// </summary>
        /// <param name="connectionString">connection string in StackExchange.Redis format</param>
        /// <remarks>https://stackexchange.github.io/StackExchange.Redis/Configuration</remarks>
        /// <returns>created connection mutliplexer</returns>
        private async Task<IConnectionMultiplexer> ConnectWithDefaultFactory(string connectionString)
        {
            IReadOnlyCollection<string> unsupportedOptions = new[] { "abortConnect=true", "tiebreaker=" };

            foreach (var unsupportedOption in unsupportedOptions)
            {
                if (connectionString.Contains(unsupportedOption, StringComparison.OrdinalIgnoreCase))
                {
                    _logger.LogWarning(
                        "Redis connection string contain unsupported options '{option}'. It will be ignored",
                        unsupportedOption);
                }
            }

            var options = ConfigurationOptions.Parse(connectionString);
            options.ClientName = Environment.MachineName;
            options.AbortOnConnectFail = false;
            // Setting TieBreaker to empty string result in not using tie breaker 
            options.TieBreaker = "";

            return await ConnectionMultiplexer.ConnectAsync(options);
        }

        public void Dispose() => InternalConnection?.Dispose();
    }

If you would be interested in the whole thing let me know and I'll see what I can do.

btw. Redis Ltd. now has an "official" Redis client for .NET built on top of StackExchange.Redis

mrmartan avatar Jan 08 '22 14:01 mrmartan

@slorello89 Would a specific proxy type or a non-specific of "MultiPrimary" work today with this the way it's intended? Similar to how Envoy works today in the code after the changes I made in #1982?

NickCraver avatar Mar 17 '22 00:03 NickCraver