KeyDB icon indicating copy to clipboard operation
KeyDB copied to clipboard

Raft Implementation

Open VivekSainiEQ opened this issue 3 years ago • 5 comments

The goal is to implement the Raft protocol for strongly consistent synchronous replication (relevant blog post here).

Any issues pertaining to situations that Raft would solve will be closed and tracked here instead.

VivekSainiEQ avatar Aug 18 '21 20:08 VivekSainiEQ

I think RAFT is fascinating technology but I wonder if its actually useful (especially for all the work involved implementing it) for the majority of KeyDB users.

People pick Redis and KeyDB because it's so blazing fast. Nothing is faster. But when you have global applications the latency of having global reads but a central write is Redis/KeyDB's achille's heel. Network not CPU has generally always been the bottleneck. The current active-active implementation is blazing fast even across the globe. It just lacks any kind of conflict-resolution/consensus. I'm using it right now in a project, and outside of this, I like it.

Implementing a RAFT solution that requires a majority confirmation from most nodes before accepting a write seems like the slowest possible feature you could implement in one of the fastest projects in the world. Especially when the other nodes are positioned all around the world. It really only helps with disaster recovery.

People who need fast DB's globally with active-active are happy with the cost of eventual consistency if it means blazing fast activity in every single location. Every datacenter is the primary, its quite wonderful.

But with RAFT, a random node is master and every node seeks its acknowledgment to complete writes. It defeats the purpose of why most people seek master-master setups.

I understand implementing CRDT is tremendously hard. But I rather it just be canceled than for you to spend your very valuable time working on a feature that most may probably not use. It might be useful in a low-latency environment like 3-nodes in California but it's unlikely to have any global use case. And applications are only becoming more and more global. The next TikTok or Pokemon Go could be powered by KeyDB but not with RAFT implementation. All speed is lost.

Thank you for letting me express my $0.02. KeyDB is wild and amazing 🙏

krunkosaurus avatar Aug 18 '21 21:08 krunkosaurus

Hi @krunkosaurus

I agree RAFT won't be the right choice for every user, but I think there are some major scenarios - especially around blpush/blpop where you really need that strong consistency. Because of KeyDB's excellent single node performance we're uniquely suited to a RAFT implementation and will be able to achieve much higher performance than other implementations. When RAFT is complete it will be an optional mode of our Active Replication feature so you don't have to pay the cost if you don't need the strong consistency.

As for CRDTs we are able to operate in that way with SETS/GETS via active replication and we will need to invest in enabling this for the more complicated datatypes. We're already seeing a lot of use there and it is definitely something we will be looking at.

I'd also mention a few things about KeyDB itself. This was started in 2019 with the help of friends although I was the only major developer for a long time. We've been able to create a team around KeyDB and though we've had some growing pains I'm really excited about being able to take on some of these larger challenges.

Hope that helps explain our thinking :)

-John

JohnSully avatar Aug 23 '21 15:08 JohnSully

Thanks @JohnSully ! You are the thankless hero we surely don't deserve! I am aware you have been largely a one-man operation for a long time much like Redis itself. Appreciate the work you do.

I agree there are practical use cases for RAFT. Any application that exists only regionally would benefit from it and the added throughput and DR.

I am happy that the CRDT work may possibly continue afterwards! Redis Enterprise is a beast to setup manually (and that's probably intentional.)

krunkosaurus avatar Aug 23 '21 17:08 krunkosaurus

I'll chime in here. We currently use Redis Enterprise solely for "stable" active-active replication and CRDTs for our 10 POP geo-distributed cluster. It does function, however in a higher latency WAN environments RS is effectively beta software (that costs $100k+/year). In our <1 year of usage we discovered several critical bugs (one of them is still not patched until Oct release) that causes memory to grow uncontrollably after a node drops out of a cluster because of network issues. This requires manual intervention. Redis RS also struggles with high latency, POP in Australia gets decoupled from Brazil quite often, due to sporadic high latency between them.

If stable CRDTs can be implemented without running the behemoth stack that is required to run Redis RS (3 nodes in each POP of the geo-cluster), we would be more than happy to pay the same price for this to KeyDB (although cheaper prices would certainly be welcome).

@JohnSully I'm not sure how valuable this is from your perspective, but have a look at the Hatchicorp's Serf project. We currently use this in our stack as a "out of band" instrumentation/metadata channel for passing data and triggering events on geo-distributed POPs. Back when we were evaluating KeyDB for our purposes, we've built out a management system to instrument KeyDB replication and recover from "bad states" using application code that used Serf as as a transport. You also get handy health checks out of it, "for free", just by virtue of the gossip protocol. It was too brittle being in the application layer (and we only spent a couple of weeks on it), but if it was part of the database application itself, it could be useful. Dynamic cluster membership, ability to emit events and queries with the option to trigger methods on remote machines is quite handy. This is even if you don't have direct connectivity between all nodes.

Perhaps leveraging the Serf's gossip protocol alongside KeyDB can help with cluster initialization and out of band management to help it self-heal and arrive at "good state" when failures occur.

yegors avatar Aug 23 '21 18:08 yegors

@JohnSully May be you save your time if will use this implementation: https://github.com/eBay/NuRaft

rnz avatar Feb 01 '22 22:02 rnz

Hey there! Really loving KeyDB! Is there any news on this?

hendrikheil avatar Mar 23 '23 09:03 hendrikheil