valkey Replica flush the old data after RDB file is ok in disk-based replication

Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database.

Aug 20 '24 10:08 enjoy-binbin

Codecov Report

Attention: Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.

Project coverage is 70.63%. Comparing base (7795152) to head (f4316a0). Report is 80 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/replication.c	57.14%	3 Missing :warning:

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #926      +/-   ##
============================================
+ Coverage     70.37%   70.63%   +0.25%     
============================================
  Files           112      114       +2     
  Lines         61505    61670     +165     
============================================
+ Hits          43286    43559     +273     
+ Misses        18219    18111     -108

Files with missing lines	Coverage Δ
src/replication.c	`87.29% <57.14%> (+0.14%)`	:arrow_up:

... and 41 files with indirect coverage changes

Aug 20 '24 10:08 codecov[bot]

@enjoy-binbin did we have a test failure here, or should we introduce new test for this case?

Aug 20 '24 12:08 ranshid

did we have a test failure here, or should we introduce new test for this case?

emmm, i don't think we have a test failure here, we don't have test that will cover this case, i test it in local. We should actually add a new test to verify this, but that will require us to add some debug code. (or i can do try to make the rename fail, i can take a try)

Aug 20 '24 15:08 enjoy-binbin

It's not a breaking change? The behavior change is that if the sync fails, the old data is still there, instead of deleted. It seems better. Nobody relies on the old behaviour I think...?

I don't think it is a breaking change (but somehow it can be?). Yes, actually it is, if the following fails, we can still keep the data in memory.

        /* Make sure the new file (also used for persistence) is fully synced
         * (not covered by earlier calls to rdb_fsync_range). */
        if (fsync(server.repl_transfer_fd) == -1) {
            ...
            cancelReplicationHandshake(1);
            return;
        }

        /* Rename rdb like renaming rewrite aof asynchronously. */
        int old_rdb_fd = open(server.rdb_filename, O_RDONLY | O_NONBLOCK);
        if (rename(server.repl_transfer_tmpfile, server.rdb_filename) == -1) {
            ...
            cancelReplicationHandshake(1);
            if (old_rdb_fd != -1) close(old_rdb_fd);
            return;
        }
        /* Close old rdb asynchronously. */
        if (old_rdb_fd != -1) bioCreateCloseJob(old_rdb_fd, 0, 0);

        /* Sync the directory to ensure rename is persisted */
        if (fsyncFileDir(server.rdb_filename) == -1) {
            ...
            cancelReplicationHandshake(1);
            return;
        }

Aug 20 '24 15:08 enjoy-binbin

@valkey-io/core-team Do you think this is a breaking change?

Aug 20 '24 15:08 zuiderkwast

I do not think it is a break change, but it changes some functions call position, but I can not find the reason why we should do this way.

Aug 22 '24 14:08 hwware

i am going to merge this one, i don't think it is a breaking change, does any of you have other concerns?

Sep 13 '24 02:09 enjoy-binbin