valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Save to Disk in Bio Thread

Open nitaicaro opened this issue 9 months ago • 4 comments

Introduction

This PR introduces a new feature that enables replicas to perform disk-based synchronization on a dedicated background thread (Bio thread). Benchmarking results demonstrate significant improvements in synchronization duration. In extreme cases, this optimization allows syncs that would have previously failed to succeed.

Problem Statement

Some administrators prefer the disk-based full synchronization mode for replicas. This mode allows replicas to continue serving clients with data while downloading the RDB file.

Valkey's predominantly single-threaded nature creates a challenge: serving client read requests and saving data from the socket to disk are not truly concurrent operations. In practice, the replica alternates between processing client requests and replication data, leading to inefficient behavior and prolonged sync durations, especially under high load.

Proposed Solution

To address this, the solution offloads the task of downloading the RDB file from the socket to a background thread. This allows the main thread to focus exclusively on handling client read requests while the background thread handles communication with the primary.

Benchmarking Results

Potential for Improvement

In theory, this optimization can lead to unbounded improvement in sync duration. By eliminating competition between client read events and socket communication (i.e., events related to handling RDB download with the primary), sync times become independent on load - the main thread handles only client reads, while the background thread focuses on primary RDB download events, allowing the system to perform consistently even under high load.

The full valkey-benchmark commands can be found in the appendix below.

Sync Duration with Feature Disabled (times in seconds)

16 threads, 64 clients: 172 seconds 32 threads, 128 clients: 436 seconds 48 threads, 192 clients: 710 seconds

Sync Duration with Feature Enabled (times in seconds)

16 threads, 64 clients: 33 seconds (80.8% improvement) 32 threads, 128 clients: 33 seconds (92.4% improvement) 48 threads, 192 clients: 33 seconds (95.3% improvement)

image

Alternative Solutions Considered

IO Threads
IO threads to not have an advantage over Bio in this case: The save-to-disk job is rare (most likely no more than several executions in a replica's lifetime), and there is never more than one simultaneous execution. Bio threads make more sense for a single, slow long running operation.

io_uring
For a single connection, io_uring doesn't provide as much of a performance boost because the primary advantage comes from batching many I/O operations together to reduce syscall overhead. With just one connection, we won't have enough operations to benefit significantly from these optimizations.

Prioritizing primary's socket in the event loop This approach would help, but less effectively than using a Bio thread. We would still need to allocate attention to handling read requests, which could limit its benefit. It could be more useful on smaller instance types with limited CPU cores.

Appendix:

Benchmarking Setup

  • Client machine: AWS c5a.16xlarge
  • Server machines: AWS c5a.2xlarge
# Step 1: Fill the primary and replica DBs with 6GB of data:

./valkey-benchmark -h <host> -p <port> -l -d 128 -t set -r 30000000 --threads 16 -c 64

# Step 2: Initiate heavy read load on the replica:

./valkey-benchmark -h <host> -p <port> -t get -r 30000000 --threads <t> -c <t> -n 1000000000 -P <P>

# Step 3: Enable/disable the config controlling the new feature:

./valkey-cli -h <host> -p <port> config set replica-save-to-disk-in-bio-thread <yes/no>

# Step 4: Initiate sync:

./valkey-cli -h <replica host> -p <replica port> replicaof <primary host> <primary port>

nitaicaro avatar Feb 26 '25 10:02 nitaicaro

Codecov Report

:x: Patch coverage is 88.57143% with 40 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 71.52%. Comparing base (8df8c84) to head (ca663c1). :warning: Report is 47 commits behind head on unstable.

Files with missing lines Patch % Lines
src/replication.c 87.76% 40 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1784      +/-   ##
============================================
+ Coverage     71.49%   71.52%   +0.03%     
============================================
  Files           123      123              
  Lines         67179    67348     +169     
============================================
+ Hits          48028    48174     +146     
- Misses        19151    19174      +23     
Files with missing lines Coverage Δ
src/bio.c 84.93% <100.00%> (+0.48%) :arrow_up:
src/server.c 88.09% <100.00%> (+0.02%) :arrow_up:
src/server.h 100.00% <ø> (ø)
src/replication.c 87.06% <87.76%> (+0.33%) :arrow_up:

... and 11 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Feb 26 '25 10:02 codecov[bot]

adding the major-decision-pending even though there is not a breaking change but I think another thread work should be carefully considered

ranshid avatar May 05 '25 13:05 ranshid

@Fusl In the weekly meeting we thought about your workload when reviewing this feature and deciding if we should enable it by default. Do you think you would want to have this feature enabled by default?

madolson avatar May 19 '25 16:05 madolson

Do you think you would want to have this feature enabled by default?

I currently use repl-diskless-load swapdb everywhere where I have clients reading from replicas and performance matters, I don't think this feature would be relevant to my current workload. I'm wondering what the overhead/performance penalty of this feature is for environments where there is only a single CPU core available (such as in resource-limited containerized environments) with no, with light, and with heavy reads being done from replicas. Do we see an overall decrease in performance due to two threads contesting for the same CPU core?

I think more testing for this feature needs to be done on real cluster setups before we turn it on by default.

Fusl avatar May 19 '25 17:05 Fusl

Do you think you would want to have this feature enabled by default?

I currently use repl-diskless-load swapdb everywhere where I have clients reading from replicas and performance matters, I don't think this feature would be relevant to my current workload. I'm wondering what the overhead/performance penalty of this feature is for environments where there is only a single CPU core available (such as in resource-limited containerized environments) with no, with light, and with heavy reads being done from replicas. Do we see an overall decrease in performance due to two threads contesting for the same CPU core?

I think more testing for this feature needs to be done on real cluster setups before we turn it on by default.

Thanks @Fusl , good points.

This change is unrelated to repl-diskless-load swapdb.

For disk-based sync, I tested this on single core, using a smaller version of the benchmark from the PR. So we see a drop in throughput during sync, but it completes successfully in a few seconds, on a scenario where without the Bio thread, it just hangs.

Overall, even on a single core, this looks like a clear win for stability.

We can explain this easily - the Bio thread can get up to 50% of CPU time now to handle RDB download, whereas before it was queued after a bunch of read events (and doesn't get even close to 50%).

P.s., without Bio, writing to disk in the main thread can block in uninterruptible sleep due to kernel I/O waits, stalling the engine. So as a bonus, the main thread is less likely to enter this mode now.

nitaicaro avatar May 26 '25 12:05 nitaicaro

Please fix DCO errors.

xbasel avatar May 28 '25 11:05 xbasel