valkey
valkey copied to clipboard
[NEW] Support for Active/Active replication
The problem/use-case that the feature addresses
This is a feature, only available in the enterprise version of redis. Would like to request for this to be made possible in Valkey?
Description of the feature
Because Valkey at most can only be set as an Active-Passive deployment, it makes it difficult to implement an Active-Active setup for disaster recovery purposes. The deram is to set up a Multi Active-Active Valkey deployment, so we can achieve an Active Active Production and Disaster Recovery setup.
Alternatives you've considered
Redis Enterprise? But very expensive...
Additional information
None other than the above
Can you add some additional information about what types of workloads you are trying to support. Active/Active replication is a hard problem with a lot of tradeoffs. It would be helpful to have more information about your specific usecase to inform those tradeoffs.
It might be this possible only on Proxy level. It make more complex if it start support in core Valkey code.
Can you add some additional information about what types of workloads you are trying to support. Active/Active replication is a hard problem with a lot of tradeoffs. It would be helpful to have more information about your specific usecase to inform those tradeoffs.
In our case, we currently use Redis to hold user session info for SSO purposes.
Because our setup is currently Active/Passive, when we do conduct a failover to disaster recovery site, all users will need to relogin. If there was an active/active setup for Valkey, it would be pretty great so there wouldnt be any downtime during a failover.
It might be this possible only on Proxy level. It make more complex if it start support in core Valkey code.
Im not quite sure how Redis Enterprise works, but im sure their version of Active/Active is done outside the core as well.
In our case, we currently use Redis to hold user session info for SSO purposes.
I presume that means you are using a hash or a string? Are you okay with a timestamp at the object level to determining which objects wins.
@btzq , I think your problem can resolved by setup proxy in-between or setup slave on secondary region and any downtime you just failover, It's great if Valkey will think to support in future
Just to add that besides Redis Enterprise, also keydb (open source redis drop-in) is supporting it and the configuration is quite trivial: --multi-master yes --active-replica yes --replicaof IP PORT
@madolson: Dont know if helps or even if your question was in this direction, but from https://docs.keydb.dev/docs/active-rep: Split Brain KeyDB can handle split brain scenarios where the connection between masters is severed, but writes continue to be made. Each write is timestamped and when the connection is restored each master will share their new data. The newest write will win. This prevents stale data from overwriting new data written after the connection was severed.
IMHO (i may be wrong) I think this is a great feature as it is a simple and cheap way to get HA without manual intervention (as all nodes accept writes, no need of election) and without the complexity (resources, elections) of cluster/sentinel setups.
I would really like to use Valkey, but "unfortunately" i need this feature.
@cjabrantes I skimmed the document quickly and don't know much about the technical details, so not sure how much it would be to implement this feature. I know that keyDB attaches a timestamp to each record that it uses for last writer wins, do you know if it replicates the entire object when mutations occur? That seems like a simple way to get last writer wins fairly quickly, but won't work well for certain workloads like distributed counters.
@madolson, Sadly i dont know how it behaves.
this would be a nice feature to add :) pity it's been categorized as a "not planned"
I'm unsure about the stability and future of KeyDB as is seems to be kinda dead and multi master is not a priority anywhere.
@mbfdias Less "not-planned" and more "need someone to implement it", which is hard given we are a community driven project.
Active/Active replication requires each primary to replay other primary's replication stream. When connection is broken between two primaries and the replication stream spills over the in-memory buffers, the primary needs to persist the stream to disk. The same is true when primary fails over. Valkey does not store replication stream on replica but only on primary, so we need to solve that problem by persisting the replication stream somewhere accessible for replica. Persisting the replication stream in a distributed store will fundamentally change the way Valkey replicates today. This is not a small change in my opinion.
An alternative is to index timestamps without storing replication stream, primaries can ping their timestamps to other primaries and each primary will send all keys with newer timestamps than other primary over replication stream. This won't support Modify operations correctly - think of INCR on both primaries but only one INCR takes effect when timestamp based conflict resolution is done, whereas both should have been effective. This can support simple get/set for strings though. This can be achieved within Valkey without fundamental changes.
@parthpatel Any idea how Redis Enterprise or KeyDB solved that issue? Meaning if they are bullet prof against all type of workloads as the one you explained or if there are some corner cases where it may not work?
From KeyDB docs:
KeyDB can handle split brain scenarios where the connection between masters is severed, but writes continue to be made. Each write is timestamped and when the connection is restored each master will share their new data. The newest write will win. This prevents stale data from overwriting new data written after the connection was severed.
By "The newest write will win." it suggests that the problem you described may happen.
what if Valkey would support for now the simple way where "simple get/set " would work fine with a note on docs about other workloads? i guess it would make happy a lot of users with more simple use cases.
Thanks!
No technical details here. But I also would like to get this feature. It will greatly save hardware due to it needs only two hosts for HA instead of three hosts for Sentinel. We use the KeyDB for that in our company, but it's almost dead already.
I don't know if it helps, but the KeyDB Active Replica documentation page says the majority of this feature was implemented in this change: https://github.com/Snapchat/KeyDB/commit/a7aa2b074049a130761bc0a98d47130b6a0ff817
Amazon MemoryDB Multi-Region is now generally available
Today, Amazon Web Services (AWS) announced the general availability of Amazon MemoryDB Multi-Region, a fully managed, active-active, multi-Region database that you can use to build applications with up to 99.999 percent availability, microsecond read, and single-digit millisecond write latencies across multiple AWS Regions. MemoryDB Multi-Region is available for Valkey, which is a Redis Open Source Software (OSS) drop-in replacement stewarded by Linux Foundation.
Amazon MemoryDB Multi-Region is now generally available
Today, Amazon Web Services (AWS) announced the general availability of Amazon MemoryDB Multi-Region, a fully managed, active-active, multi-Region database that you can use to build applications with up to 99.999 percent availability, microsecond read, and single-digit millisecond write latencies across multiple AWS Regions. MemoryDB Multi-Region is available for Valkey, which is a Redis Open Source Software (OSS) drop-in replacement stewarded by Linux Foundation.
How did they achieve that when Valkey OSS doesnt have Active Active yet???
Same question... I found this issue when looked for more information about active-active support in Valkey.
Amazon MemoryDB » MemoryDB Multi-Region » Consistency and conflict resolution
I would love to see Valkey having this feature one day. There is certainly prominent need for this.
Could you elaborate more on the use case of this feature? Is it focusing on caching or durable data store? What are the acceptable behaviors during failure?
Could you elaborate more on the use case of this feature? Is it focusing on caching or durable data store? What are the acceptable behaviors during failure?
-
Multiple invocations of the replicaof command will result in adding additional masters, not replacing the current one -
nodes will not drop its database when sync'ing with the other masters -
nodes will merge any writes from the masters with its own internal database -
nodes will default to last operation wins
This means that a replica with multiple masters will contain a superset of the data of all its masters. If two masters have a value with the same key it maybe undefined which key will be taken. If a master deletes a key that exists on another master the replica will no longer contain a copy of that key.
multi-master replication is versatile but shines in specific scenarios:
-
Active-Active Geo-Distributed Workloads Example: Deploying nodes in multiple regions (e.g., US-East, EU-West) where each node serves local clients. Writes in one region replicate to others, reducing latency for globally distributed apps.
Ideal for real-time systems like gaming leaderboards, session stores, or collaborative editing tools needing low-latency writes across regions. -
High Availability (HA) with Write Scalability Eliminates single-master bottlenecks; all nodes accept writes, improving throughput.
Suitable for workloads requiring continuous write availability (e.g., IoT telemetry ingestion, logging). -
Conflict-Tolerant Caching If data can be recomputed or is ephemeral (e.g., web session caching), multi-master replication works well. Conflicts are rare or acceptable since cached data is often transient.
-
Real-Time Analytics Write-heavy metrics/event tracking (e.g., page views, ad clicks) where minor data conflicts are tolerable.
Multi-master replication introduces complexities during failures:
-
Network Partitions (Split-Brain) If nodes disconnect, they continue accepting reads/writes independently. On reconnection: Conflicts arise for keys modified on both sides. Nodes uses last-write-wins (LWW) with timestamps to resolve conflicts, which may discard older writes. Applications must tolerate potential data loss or design keys to avoid conflicts (e.g., region-prefixed keys: us:user:1, eu:user:1). -
Data Consistency Eventual consistency: Changes propagate asynchronously. Reads during replication lag may return stale data. No strong consistency guarantees (unlike Redis with WAIT or Redlock). -
Node Failures Surviving nodes continue operating. Failed nodes rejoin and resync automatically (using RDB snapshots or AOF logs). Downtime is minimized, but ensure sufficient replicas to handle traffic.
When to Use Multi-Master Replication
Good fit:
Active-active architectures requiring low-latency read/writes.
Conflict-tolerant or regionally partitioned data.
High availability prioritized over strong consistency.
Hi mbfdias,
Thank you for providing these details. I think this is a great start. I agree that last writer win is a good way to resolve conflict, which happens rarely in general. I like the use case details. They are very technical focused, and included a lot of decision on technical trade offs. But I want to better understand why customer would make such trade off behind these technical trade-offs.
E.g. One customer use case could be caching records from another database. Since customer could always reload data from the other database which serves as source of truth, it is ok if some of the write is lost during node failure, as record could always be reloaded from the other databases.
- With multi-region DR architecture, if cache is not multi-region, when traffic shifts from one region to another, these requests would hit cache miss, which may put more load on backend database on replenish those cache entries, overwhelming database.
- If cache is active-passive, it only works with 2 regions. 2 Active-passive clusters are created. Each region would have one cluster as active, and the other as passive. When one region is evacuated, traffic shift to another. One of the cluster fails over at the same time. So the healthy region have both cluster in active status. The challenge is that the healthy region suddenly have 2 cache clusters to query/update. It is complicated to figure out which cluster to query. This solution does not scale to more than 2 regions, as evacuated traffic may shift to multiple adjacent regions. Since there is only one region that could be active. This solution does not work.
- Only active-active would work well and simple to use. The traffic evacuated to other region would have no issue on handling read/write traffic.
Only active-active would work well and simple to use. The traffic evacuated to other region would have no issue on handling read/write traffic.
couldn't have said it better myself. In cases where one wants to have two, three or more servers running valkey locally (for smallest latency possible on read/write) and be able to write a key on one server and miliseconds later, depending on network health, read/write/delete the same key on another server of this "multi master cluster" with no quorum/zookeeper/ issues or worries about in case of node faillure.
If a node crashes, others continue operating. When the node recovers, it automatically syncs missing/new data from peers.
“Most of the time, the key is there”: Replication is eventually consistent. During normal operation, keys replicate quickly, but temporary stale/missing data is possible during partitions or lag.
This is exactly what I have in my mind. I am from AWS ElastiCache/MemoryDB team. I can see this works for some of the customers, but not sure whether there are enough customer asking for this kind of trade off. Thank you for the detailed reply. Would you be interested in collaborating this effort?
From technical perspective, there are 2 major challenges.
- Support LWW semantic on existing data structures.
- Update Replication logic to allow multiple replication stream on primary node, and replicate the timestamp as part of command replication.
For No.1. it requires adding a timestamp on each key and element. The other day, I come across the issue that supports TTL at Hash, set, sorted set. https://github.com/valkey-io/valkey/issues/640. @ranshid is currently prototyping an approach where a metadata is added on each SDS, so that each hash field could have its own TTL. We can collaborate with Ran to see whether we can make this metadata extendable to support a timestamp.
For No.2. MemoryDB Multi-region have done this work internally. I can try to see whether we could open source this.