valkey icon indicating copy to clipboard operation
valkey copied to clipboard

add diskless-fallback option for repl-diskless-load configuration

Open advaMosh opened this issue 1 year ago • 2 comments

Description

When a Replica with the default value of repl-diskless-load ("disabled") performs a full sync with the master, it first syncs the RDB data to disk. Only after completing this step does the Replica load the data into memory.

Disconnections during disk-based sync result sometimes in infinite sync loops and outdated data on the replica.
The longer the primary and replica are disconnected and out of sync, the more stale the data that is on the replica becomes, and the less beneficial it becomes to do disk-based sync as time progresses.

A common example is short writes (insufficient disk space) on the replica - if a replica doesn't have enough disk space for the RDB it will fail repeatedly.

This solution will also address another issue - Replica Client Output Buffer overrun on the master side. In many cases, high traffic persists during subsequent sync attempts, leading to repeating COB overruns. Under the assumption that diskless syncs are faster, this will reduce the risk of exceeding the buffer size limit.

Code Changes

server.h and server.c

added the new fields -

  1. sync_aborts_total - This field is incremented when we recognize that a disconnection happened during a sync attempt, for information purpose only.

  2. last_sync_aborted same as previous, but it's reset after a successful sync. we use it when choosing a new sync strategy.

  3. replica_last_sync_type indicates which strategy used in the last sync.


replication.c

  1. In the existing code, when a sync fails, cancelReplicationHandshake is called , which then calls replicationAbortSyncTransfer . last_sync_aborted is set to true, and sync_aborts_total is increased.
  2. function shouldFallbackToDisklessLoad added in order to implement the fallback during useDisklessLoad - the logic for replica choice of sync type.
  3. The Replica sends newline to primary node once when flushing it's data, and every time the RDB load progress. If the primary loses connection to the replica, the state will change is changed to CONN_STATE_ERROR.

added a check of connection state after the RDB load finish successfully, and before the load marked as finish by server.repl_state = REPL_STATE_CONNECTED and connection state changes. after that, sync is marked as successful and last_sync_aborted is reset.

Testing

Added a new test in tests/unit/cluster/diskless-sync-fallback.tcl which checks just the fallback logic; If a disk-based sync has failed, then the next sync attempt should be diskless (RDB received straight to parser).

advaMosh avatar Jun 05 '24 13:06 advaMosh

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 70.21%. Comparing base (71dd85d) to head (67dfbc0). Report is 29 commits behind head on unstable.

:exclamation: Current head 67dfbc0 differs from pull request most recent head adf4fa4

Please upload reports for the commit adf4fa4 to get more accurate results.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #601      +/-   ##
============================================
+ Coverage     70.09%   70.21%   +0.11%     
============================================
  Files           110      110              
  Lines         60041    60085      +44     
============================================
+ Hits          42085    42187     +102     
+ Misses        17956    17898      -58     
Files Coverage Δ
src/config.c 78.06% <ø> (-0.26%) :arrow_down:
src/replication.c 87.44% <100.00%> (+0.34%) :arrow_up:
src/server.c 88.58% <100.00%> (-0.35%) :arrow_down:
src/server.h 100.00% <ø> (ø)

... and 31 files with indirect coverage changes

codecov[bot] avatar Jun 06 '24 06:06 codecov[bot]

I do not think the top comment explains the main issue we are trying to solve here. disk based load on the replica side is meant in order to help the replica keep on serving reads and avoid bigger data loss when sync is in progress. fallback to diskless is mostly helpful in cases were we have a disk related issue (eg. short writes) and not in order to bridge COB overrun issues which are better handled by different solutions (which we currently lack some, like throttling on COB growth). I would suggest to rewrite the top comment.

ranshid avatar Jun 11 '24 10:06 ranshid

@ranshid Do we still want to push this?

madolson avatar Sep 05 '25 18:09 madolson