valkey icon indicating copy to clipboard operation
valkey copied to clipboard

[NEW] Support for Active/Active replication

Open btzq opened this issue 1 year ago • 36 comments

The problem/use-case that the feature addresses

This is a feature, only available in the enterprise version of redis. Would like to request for this to be made possible in Valkey?

Description of the feature

Because Valkey at most can only be set as an Active-Passive deployment, it makes it difficult to implement an Active-Active setup for disaster recovery purposes. The deram is to set up a Multi Active-Active Valkey deployment, so we can achieve an Active Active Production and Disaster Recovery setup.

Alternatives you've considered

Redis Enterprise? But very expensive...

Additional information

None other than the above

btzq avatar May 17 '24 14:05 btzq

Can you add some additional information about what types of workloads you are trying to support. Active/Active replication is a hard problem with a lot of tradeoffs. It would be helpful to have more information about your specific usecase to inform those tradeoffs.

madolson avatar May 19 '24 19:05 madolson

It might be this possible only on Proxy level. It make more complex if it start support in core Valkey code.

Amitgb14 avatar May 21 '24 00:05 Amitgb14

Can you add some additional information about what types of workloads you are trying to support. Active/Active replication is a hard problem with a lot of tradeoffs. It would be helpful to have more information about your specific usecase to inform those tradeoffs.

In our case, we currently use Redis to hold user session info for SSO purposes.

Because our setup is currently Active/Passive, when we do conduct a failover to disaster recovery site, all users will need to relogin. If there was an active/active setup for Valkey, it would be pretty great so there wouldnt be any downtime during a failover.

btzq avatar May 21 '24 00:05 btzq

It might be this possible only on Proxy level. It make more complex if it start support in core Valkey code.

Im not quite sure how Redis Enterprise works, but im sure their version of Active/Active is done outside the core as well.

btzq avatar May 21 '24 00:05 btzq

In our case, we currently use Redis to hold user session info for SSO purposes.

I presume that means you are using a hash or a string? Are you okay with a timestamp at the object level to determining which objects wins.

madolson avatar May 21 '24 01:05 madolson

@btzq , I think your problem can resolved by setup proxy in-between or setup slave on secondary region and any downtime you just failover, It's great if Valkey will think to support in future

Amitgb14 avatar May 21 '24 02:05 Amitgb14

Just to add that besides Redis Enterprise, also keydb (open source redis drop-in) is supporting it and the configuration is quite trivial: --multi-master yes --active-replica yes --replicaof IP PORT

@madolson: Dont know if helps or even if your question was in this direction, but from https://docs.keydb.dev/docs/active-rep: Split Brain KeyDB can handle split brain scenarios where the connection between masters is severed, but writes continue to be made. Each write is timestamped and when the connection is restored each master will share their new data. The newest write will win. This prevents stale data from overwriting new data written after the connection was severed.

IMHO (i may be wrong) I think this is a great feature as it is a simple and cheap way to get HA without manual intervention (as all nodes accept writes, no need of election) and without the complexity (resources, elections) of cluster/sentinel setups.

I would really like to use Valkey, but "unfortunately" i need this feature.

cjabrantes avatar Jun 25 '24 17:06 cjabrantes

@cjabrantes I skimmed the document quickly and don't know much about the technical details, so not sure how much it would be to implement this feature. I know that keyDB attaches a timestamp to each record that it uses for last writer wins, do you know if it replicates the entire object when mutations occur? That seems like a simple way to get last writer wins fairly quickly, but won't work well for certain workloads like distributed counters.

madolson avatar Jun 25 '24 17:06 madolson

@madolson, Sadly i dont know how it behaves.

cjabrantes avatar Jun 25 '24 17:06 cjabrantes

this would be a nice feature to add :) pity it's been categorized as a "not planned"

I'm unsure about the stability and future of KeyDB as is seems to be kinda dead and multi master is not a priority anywhere.

mbfdias avatar Aug 01 '24 20:08 mbfdias

@mbfdias Less "not-planned" and more "need someone to implement it", which is hard given we are a community driven project.

madolson avatar Aug 01 '24 20:08 madolson

Active/Active replication requires each primary to replay other primary's replication stream. When connection is broken between two primaries and the replication stream spills over the in-memory buffers, the primary needs to persist the stream to disk. The same is true when primary fails over. Valkey does not store replication stream on replica but only on primary, so we need to solve that problem by persisting the replication stream somewhere accessible for replica. Persisting the replication stream in a distributed store will fundamentally change the way Valkey replicates today. This is not a small change in my opinion.

An alternative is to index timestamps without storing replication stream, primaries can ping their timestamps to other primaries and each primary will send all keys with newer timestamps than other primary over replication stream. This won't support Modify operations correctly - think of INCR on both primaries but only one INCR takes effect when timestamp based conflict resolution is done, whereas both should have been effective. This can support simple get/set for strings though. This can be achieved within Valkey without fundamental changes.

parthpatel avatar Sep 24 '24 16:09 parthpatel

@parthpatel Any idea how Redis Enterprise or KeyDB solved that issue? Meaning if they are bullet prof against all type of workloads as the one you explained or if there are some corner cases where it may not work?

From KeyDB docs: KeyDB can handle split brain scenarios where the connection between masters is severed, but writes continue to be made. Each write is timestamped and when the connection is restored each master will share their new data. The newest write will win. This prevents stale data from overwriting new data written after the connection was severed.

By "The newest write will win." it suggests that the problem you described may happen.

what if Valkey would support for now the simple way where "simple get/set " would work fine with a note on docs about other workloads? i guess it would make happy a lot of users with more simple use cases.

Thanks!

cjabrantes avatar Sep 24 '24 16:09 cjabrantes

No technical details here. But I also would like to get this feature. It will greatly save hardware due to it needs only two hosts for HA instead of three hosts for Sentinel. We use the KeyDB for that in our company, but it's almost dead already.

gesundes avatar Oct 05 '24 21:10 gesundes

I don't know if it helps, but the KeyDB Active Replica documentation page says the majority of this feature was implemented in this change: https://github.com/Snapchat/KeyDB/commit/a7aa2b074049a130761bc0a98d47130b6a0ff817

esaporski avatar Oct 25 '24 18:10 esaporski

Amazon MemoryDB Multi-Region is now generally available

Today, Amazon Web Services (AWS) announced the general availability of Amazon MemoryDB Multi-Region, a fully managed, active-active, multi-Region database that you can use to build applications with up to 99.999 percent availability, microsecond read, and single-digit millisecond write latencies across multiple AWS Regions. MemoryDB Multi-Region is available for Valkey, which is a Redis Open Source Software (OSS) drop-in replacement stewarded by Linux Foundation.

air3ijai avatar Dec 11 '24 11:12 air3ijai

Amazon MemoryDB Multi-Region is now generally available

Today, Amazon Web Services (AWS) announced the general availability of Amazon MemoryDB Multi-Region, a fully managed, active-active, multi-Region database that you can use to build applications with up to 99.999 percent availability, microsecond read, and single-digit millisecond write latencies across multiple AWS Regions. MemoryDB Multi-Region is available for Valkey, which is a Redis Open Source Software (OSS) drop-in replacement stewarded by Linux Foundation.

How did they achieve that when Valkey OSS doesnt have Active Active yet???

btzq avatar Dec 11 '24 12:12 btzq

Same question... I found this issue when looked for more information about active-active support in Valkey.

Amazon MemoryDB » MemoryDB Multi-Region » Consistency and conflict resolution

air3ijai avatar Dec 11 '24 12:12 air3ijai

I would love to see Valkey having this feature one day. There is certainly prominent need for this.

anvu03 avatar Feb 17 '25 05:02 anvu03

Could you elaborate more on the use case of this feature? Is it focusing on caching or durable data store? What are the acceptable behaviors during failure?

xingbowang avatar Feb 17 '25 06:02 xingbowang

Could you elaborate more on the use case of this feature? Is it focusing on caching or durable data store? What are the acceptable behaviors during failure?

  • Multiple invocations of the replicaof command will result in adding additional masters, not replacing the current one
    
  • nodes will not drop its database when sync'ing with the other masters
    
  • nodes will merge any writes from the masters with its own internal database
    
  • nodes will default to last operation wins
    

This means that a replica with multiple masters will contain a superset of the data of all its masters. If two masters have a value with the same key it maybe undefined which key will be taken. If a master deletes a key that exists on another master the replica will no longer contain a copy of that key.

multi-master replication is versatile but shines in specific scenarios:

  1. Active-Active Geo-Distributed Workloads Example: Deploying nodes in multiple regions (e.g., US-East, EU-West) where each node serves local clients. Writes in one region replicate to others, reducing latency for globally distributed apps.

         Ideal for real-time systems like gaming leaderboards, session stores, or collaborative editing tools needing low-latency writes across regions.
    
  2. High Availability (HA) with Write Scalability Eliminates single-master bottlenecks; all nodes accept writes, improving throughput.

         Suitable for workloads requiring continuous write availability (e.g., IoT telemetry ingestion, logging).
    
  3. Conflict-Tolerant Caching If data can be recomputed or is ephemeral (e.g., web session caching), multi-master replication works well. Conflicts are rare or acceptable since cached data is often transient.

  4. Real-Time Analytics Write-heavy metrics/event tracking (e.g., page views, ad clicks) where minor data conflicts are tolerable.

Multi-master replication introduces complexities during failures:

  • Network Partitions (Split-Brain)
    
      If nodes disconnect, they continue accepting reads/writes independently. On reconnection:
    
          Conflicts arise for keys modified on both sides. Nodes uses last-write-wins (LWW) with timestamps to resolve conflicts, which may discard older writes.
    
          Applications must tolerate potential data loss or design keys to avoid conflicts (e.g., region-prefixed keys: us:user:1, eu:user:1).
    
  • Data Consistency
    
      Eventual consistency: Changes propagate asynchronously. Reads during replication lag may return stale data.
    
      No strong consistency guarantees (unlike Redis with WAIT or Redlock).
    
  • Node Failures
    
      Surviving nodes continue operating. Failed nodes rejoin and resync automatically (using RDB snapshots or AOF logs).
    
      Downtime is minimized, but ensure sufficient replicas to handle traffic.
    

When to Use Multi-Master Replication

Good fit:

    Active-active architectures requiring low-latency read/writes.

    Conflict-tolerant or regionally partitioned data.

    High availability prioritized over strong consistency.

mbfdias avatar Feb 17 '25 08:02 mbfdias

Hi mbfdias,

Thank you for providing these details. I think this is a great start. I agree that last writer win is a good way to resolve conflict, which happens rarely in general. I like the use case details. They are very technical focused, and included a lot of decision on technical trade offs. But I want to better understand why customer would make such trade off behind these technical trade-offs.

E.g. One customer use case could be caching records from another database. Since customer could always reload data from the other database which serves as source of truth, it is ok if some of the write is lost during node failure, as record could always be reloaded from the other databases.

  • With multi-region DR architecture, if cache is not multi-region, when traffic shifts from one region to another, these requests would hit cache miss, which may put more load on backend database on replenish those cache entries, overwhelming database.
  • If cache is active-passive, it only works with 2 regions. 2 Active-passive clusters are created. Each region would have one cluster as active, and the other as passive. When one region is evacuated, traffic shift to another. One of the cluster fails over at the same time. So the healthy region have both cluster in active status. The challenge is that the healthy region suddenly have 2 cache clusters to query/update. It is complicated to figure out which cluster to query. This solution does not scale to more than 2 regions, as evacuated traffic may shift to multiple adjacent regions. Since there is only one region that could be active. This solution does not work.
  • Only active-active would work well and simple to use. The traffic evacuated to other region would have no issue on handling read/write traffic.

xingbowang avatar Feb 19 '25 00:02 xingbowang

Only active-active would work well and simple to use. The traffic evacuated to other region would have no issue on handling read/write traffic.

couldn't have said it better myself. In cases where one wants to have two, three or more servers running valkey locally (for smallest latency possible on read/write) and be able to write a key on one server and miliseconds later, depending on network health, read/write/delete the same key on another server of this "multi master cluster" with no quorum/zookeeper/ issues or worries about in case of node faillure.

If a node crashes, others continue operating. When the node recovers, it automatically syncs missing/new data from peers.

“Most of the time, the key is there”: Replication is eventually consistent. During normal operation, keys replicate quickly, but temporary stale/missing data is possible during partitions or lag.

mbfdias avatar Feb 19 '25 01:02 mbfdias

This is exactly what I have in my mind. I am from AWS ElastiCache/MemoryDB team. I can see this works for some of the customers, but not sure whether there are enough customer asking for this kind of trade off. Thank you for the detailed reply. Would you be interested in collaborating this effort?

xingbowang avatar Feb 21 '25 17:02 xingbowang

From technical perspective, there are 2 major challenges.

  1. Support LWW semantic on existing data structures.
  2. Update Replication logic to allow multiple replication stream on primary node, and replicate the timestamp as part of command replication.

For No.1. it requires adding a timestamp on each key and element. The other day, I come across the issue that supports TTL at Hash, set, sorted set. https://github.com/valkey-io/valkey/issues/640. @ranshid is currently prototyping an approach where a metadata is added on each SDS, so that each hash field could have its own TTL. We can collaborate with Ran to see whether we can make this metadata extendable to support a timestamp.

For No.2. MemoryDB Multi-region have done this work internally. I can try to see whether we could open source this.

xingbowang avatar Feb 21 '25 17:02 xingbowang