cachegrand icon indicating copy to clipboard operation
cachegrand copied to clipboard

Distributed transactions - clarification needed

Open dumblob opened this issue 1 year ago • 4 comments

I see the design documents are not yet much written but I would like to know in advance what the plans for transactions and then distributed transactions are.

Could you elaborate?

Redis (which Cache Grand itself compares to) has the policy of "a request is transactional if satisfiable by the instance processing the request". I.e. any (Lua) script submitted with a request is transactional. Also distributed Redis does not have any guarantees when it comes to distributed transactions (which Redis actually does not support at all IIRC).

Thanks!

dumblob avatar Sep 15 '22 07:09 dumblob

As it's not yet defined in detail I can't get deeper, we are currently evaluating different approaches.

In general cachegrand supports simple transactions (e.g. No row isolation and now rollback midway through, they can only be aborted but the changes already done will stay in place), they are used for each write/delete operation and this behaviour will apply to distributed operations as well.

Meanwhile the impressive high numbers provided by cachegrand are cool, I think that having these functionalities, even if a performance penalty has to be paid, makes more sense.

We will be able to have that because we will implement active active replication, of course this means that if you run a transactional operation on a node that doesn't own the data will be slower, to limit that we will provide a smart proxy that will be a bridge between standard redis clients and cachegrand and will have the necessary capabilities to figure out which node should handle what and redirect the request to the right node.

So, to summarise, if you have a command or some code that has to operate on keys owned by different node, it will work just fine (although will be of course slower).

danielealbano avatar Sep 15 '22 08:09 danielealbano

Ok, thanks!

What you write would be great - finally a competitor to Amazon Redis-compatible store which offers seamless distributed transactions with full ACID guarantees (not just byzantine or eventual consistency) to standard Redis clients.

Please close this issue once you have written the design documents and documentation - I will then take a look :wink:.

dumblob avatar Sep 16 '22 17:09 dumblob

To be honest my aim is to build a base layer which offers these kind of functionalities (including network nenrke bypass, timeseries on disk database and storage disk bypass) to upper modules and then have upper modules for redis, Kafka, graphql, etc.

Basically you can pipe your data via streams using redis or Kafka, apply computation via Web assembly supported languages and then expose them via other Kafka streams, redis or graphql. Everything in-place, without having to send data around forth and back and with a lower slade consumption!

Will not take a day but that's the vision and end goal, for now the redis support helps to smooth out a lot of the components required and makes cachegrand usable and ready to be expanded.

danielealbano avatar Sep 16 '22 18:09 danielealbano

@dumblob finally I think I put together and mostly investigated all the features I want to cover with the data replication, I will put together some high level documentation but in a few words the high level concepts are:

  • the hashtable is sharded over a fixed amount of shard (configurable) but in the range of 8192 / 16384 / 32768 etc.
    • although I am investigating random slicing but I need to analyze the downsides and the performance impact
    • probably in this case I would end up using an ART (Adaptive Radix Tree)
  • each shard can be associated with one or more nodes which will host the data, to achieve active-active replication
  • shards will be load balanced across the nodes automatically
    • the load balancer will assign the shards based on the resources available, the resources consumed over a window of time and the resources would cost to migrate the shard itself
    • the load balancer will take into account fault domains identified with rack and region awarness (of course in the cloud it's not possible to have rack awarness)
  • the internal database already uses locking so over 14 buckets so we will extend this feature to the active-active replica, this will impact the performance but will guarantee that there will be no madness with the data
    • although at later stage we will implement a last-write-wins approach where the data will be versioned using virtual wall clock times (e.g. a wall clock time of the cluster) and the last version will endup being replicated everywhere (this will not be optimal with commands that operates on the data but will be the fastest)
  • the protocol will be QUIC to reduce the impact of the OS as it's based on UDP
    • down the line we will add support for XDP, QUIC uses UDP so it's not too a big problem
    • I was thinking to implement my own protocol over UDP but then there are other things to deal with (e.g. fragmentation) and decided to avoid it

danielealbano avatar Oct 19 '22 22:10 danielealbano

Hi, I am sorry for not reacting earlier. Thanks a lot for the comprehensive insight into the potential future. Did anything change since October?

If not, I will take a look at the current implementation (if any) to find out more about the inner workings :wink:.

dumblob avatar Jun 12 '23 18:06 dumblob

Hey @dumblob,

Honestly we have implemented the support for HA & LB yet, we had a few plans adjustment (and also we were too excited and built our own cloud service to test out the approach we had in mind) :)

danielealbano avatar Jun 12 '23 18:06 danielealbano

That sounds encouraging!

Feel free to keep us updated on what is the current direction of this cool endeavour :+1:

dumblob avatar Jun 13 '23 07:06 dumblob

Thanks @dumblob!

Well basically before focusing on the LB & HA support we decided to give priority to the data ingestion support, implementing some of the Kafka APIs and the Redis Stream APIs, and the data processing support, implementing Web Assembly support for the data processing, probably starting with Python.

At least we will be able to show the whole end to end flow of what can be done with cachegrand!

Basically you will be able to ingest, process and serve the data straight away with just one single platform, and depending on your needs, in real time as well.

danielealbano avatar Jun 13 '23 13:06 danielealbano

At least we will be able to show the whole end to end flow of what can be done with cachegrand!

Yes, of course, quick wins first!

Basically you will be able to ingest, process and serve the data straight away with just one single platform, and depending on your needs, in real time as well.

Sounds almost too good to be true :wink:.

Btw. speaking of "depending on your needs, in real time as well" - will this be a global "setting" or rather a per request setting?

dumblob avatar Jun 13 '23 21:06 dumblob

Btw. speaking of "depending on your needs, in real time as well" - will this be a global "setting" or rather a per request setting?

To be honest, being able to enable or disable ACID properties at run-time is not a thing unless the database is designed to operate in such a way AND the protocol supports the ability to specify these kind of settings, which is not the case with RESP(2/3).

But, something we would like to do is the ability to specify the storage backend & the replication setting per key where:

  • if the store is not persistent on disk, you shouldn't really care too much about ACID vs non-ACID for performances, it's in memory it's going to be blazing fast
  • if you want high performances you don't want to have the data replication

Of course it will be possible to set the storage backend in advance, via special cachegrand commands over the Redis protocl, but they will have to be run before the keys are created otherwise they will be saved in the default storage backend.

In this regard cachegrand will be able to handle this scenario in this way.

danielealbano avatar Jun 13 '23 23:06 danielealbano

Of course it will be possible to set the storage backend in advance, via special cachegrand commands over the Redis protocl, but they will have to be run before the keys are created otherwise they will be saved in the default storage backend.

So basically for every single key one can choose ACID/non-ACID but only once and only before the first write. Right?

If my understanding is correct, then this would actually cover many of the use cases I had in mind when submitting the "per request" proposal.

I am so eager to see what cachegrand will become :wink:!

dumblob avatar Jun 14 '23 19:06 dumblob