garnet icon indicating copy to clipboard operation
garnet copied to clipboard

Are garnet benchmarks reasonable comparisons with a smart redis deployment?

Open TimLovellSmith opened this issue 1 year ago • 1 comments

Someone raised what seems like some intelligent questions on reddit, and I figure those could be better asked here. The key one that I'm also curious about is, are the garnet multithreading benchmarks just comparing with a single redis instance. Instead of running a redis cluster with one redis-instance-per-core, which is typically the most efficient way to deploy redis?

they’re log-scales, so it’s actually 1 to 2 orders of magnitudes higher throughput Are they doing the same thing all the other drop-in Redis clones do and compare their multi-core/multi-thread version against a single instance of Redis? Or are they comparing in-process to across the network? When you run an instance or Redis per-core on the same machine, the multi-threaded drop-in replacements rarely show any real performance improvement.

(Ref to:

https://microsoft.github.io/garnet/docs/benchmarking/results-resp-bench

"We provision two Azure Standard F72s v2 virtual machines (72 vcpus, 144 GiB memory each) running Linux (Ubuntu 20.04), with accelerated TCP enabled."

)

TimLovellSmith avatar Apr 01 '24 16:04 TimLovellSmith

Comparing systems requires all servers to expose the same API so that a common client benchmark can run against all of them. Pointing the client benchmark to a single IP address and port will naturally make Redis work as a single process (instance) on the server side.

We can try to run K instances of OSS Redis on the server, but then someone needs to account for the challenges and costs of redirecting client key requests to the correct instance of Redis.

  • Every client will need to establish K connections - one to each Redis server - which will lead to a quadratic blow up in the number of connections and the consequent lack of scalability.
  • Also, for a given client batch of requests, different items in the batch will need to be sent to different shards, which means the effectiveness of batching is reduced K-fold.
  • As data skew increases, one of the shards can become the bottleneck and further impact performance.

Alternatively, we would need to provision a proxy that accepts data and directs to the correct shard, which will then become the bottleneck due to the extra data copies server-side data shuffling and movement costs. This is why modern systems such as Garnet and Dragonfly support multi-threading natively. On a single modern machine, bringing requests to the data via server-side shuffling (in a shard-by-core design) is generally worse than bringing data to the requests (in a shared memory design), as the latter is limited only by raw CPU/memory cache coherence speeds, does not require up-front data partitioning or extra metadata/routing management, and is resilient to workload skew. See the Shadowfax paper for details.

KeyDB and Dragonfly perform similar comparisons against standalone Redis, and ensure that one instance (process) of all competing systems is being compared. In a realistic workload, there will be hundreds of sessions connected to every instance, so doing the benchmark this way is more realistic.

In the Garnet experiments, KeyDB and Dragonfly were also setup to run as a single instance while using multiple threads on the server side, similar to how Garnet operates.

Finally, our measurements capture the end-to-end client-server communication costs both for latency and throughput experiments, as measured on the client side with the server running on a different VM. That is the only way to correctly compare systems in this category.

badrishc avatar Apr 01 '24 20:04 badrishc

Comparing systems requires all servers to expose the same API so that a common client benchmark can run against all of them. Pointing the client benchmark to a single IP address and port will naturally make Redis work as a single process (instance) on the server side.

@badrishc Yes, and no. It wouldn't be an endpoints-to-endpoints comparison, or shards-shards comparison, or connections-connections comparison - but it might still be a pretty good apples-to-apples comparison in terms of end-to-end scenarios, and that is possibly more relevant to the community trying to understand how much perf they are getting for some $ of managed services than hearing about forced shard-shard performance comparisons.

You could run a smart client that handles both cluster mode and non-cluster mode, like StackExchange.Redis against a single redis cluster endpoint, from which it will dynamically determine the cluster topology with cluster NODES. And compare that against running against a single Garnet endpoint.

Like that's an example of how the same client API for an app to use, can easily run the same benchmarking code against different topologies. And it is a relevant scenario to a bunch of people potentially trying to figure out whether they can switch.

Another possible comparison, is to run any standard redis benchmarking tool against a RedisEnterprise EnterpriseClustering endpoint, where the server should accept the same protocol as garnet, even though its sharded on the backend.

TimLovellSmith avatar Apr 03 '24 22:04 TimLovellSmith