Replica flush the old data after RDB file is ok in disk-based replication
Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database.
Codecov Report
Attention: Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.
Project coverage is 70.63%. Comparing base (
7795152) to head (f4316a0). Report is 80 commits behind head on unstable.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/replication.c | 57.14% | 3 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## unstable #926 +/- ##
============================================
+ Coverage 70.37% 70.63% +0.25%
============================================
Files 112 114 +2
Lines 61505 61670 +165
============================================
+ Hits 43286 43559 +273
+ Misses 18219 18111 -108
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/replication.c | 87.29% <57.14%> (+0.14%) |
:arrow_up: |
@enjoy-binbin did we have a test failure here, or should we introduce new test for this case?
did we have a test failure here, or should we introduce new test for this case?
emmm, i don't think we have a test failure here, we don't have test that will cover this case, i test it in local. We should actually add a new test to verify this, but that will require us to add some debug code. (or i can do try to make the rename fail, i can take a try)
It's not a breaking change? The behavior change is that if the sync fails, the old data is still there, instead of deleted. It seems better. Nobody relies on the old behaviour I think...?
I don't think it is a breaking change (but somehow it can be?). Yes, actually it is, if the following fails, we can still keep the data in memory.
/* Make sure the new file (also used for persistence) is fully synced
* (not covered by earlier calls to rdb_fsync_range). */
if (fsync(server.repl_transfer_fd) == -1) {
...
cancelReplicationHandshake(1);
return;
}
/* Rename rdb like renaming rewrite aof asynchronously. */
int old_rdb_fd = open(server.rdb_filename, O_RDONLY | O_NONBLOCK);
if (rename(server.repl_transfer_tmpfile, server.rdb_filename) == -1) {
...
cancelReplicationHandshake(1);
if (old_rdb_fd != -1) close(old_rdb_fd);
return;
}
/* Close old rdb asynchronously. */
if (old_rdb_fd != -1) bioCreateCloseJob(old_rdb_fd, 0, 0);
/* Sync the directory to ensure rename is persisted */
if (fsyncFileDir(server.rdb_filename) == -1) {
...
cancelReplicationHandshake(1);
return;
}
@valkey-io/core-team Do you think this is a breaking change?
I do not think it is a break change, but it changes some functions call position, but I can not find the reason why we should do this way.
i am going to merge this one, i don't think it is a breaking change, does any of you have other concerns?