File changes not update to final volume in cascading geo-replication setup
Description of problem:
I have three gluster zones and I create a gluster volume (xfs) for each zone using the following layout:
Zone A: 2 nodes, replica 2
Zone B: 2 nodes, replica 2
Zone C: 1 node, distribute
I then setup cascading geo-replication from Zone A -> Zone B -> Zone C.
I mount volume xfs in Zone A, and dump a 5.5GB Windows 22H2 ISO file (sha1sum: a3f1a71bb41517e2ef07f09093bbcdca4f5c313c) to the mounting point. After a while I notice replication task dump the file from Zone A to Zone B, then Zone B to Zone C.
Once the geo-replication task is completed, I use sha1sum to check the file in three zone's volume and I get same sha1 hash value. So far so good.
Next, I try to modify last byte (using hexedit) in the file. I can notice the geo-replication from Zone A to Zone B always success, but Zone B to Zone C mostly fail to work. I wait overnight and hope it can propagate to C but it doesn't. I use sha1sum to check the file content for all 3 zones.
If I update 1st or 2nd byte of the file, the geo-replication works by replication from zone A, B and C.
This log message keep show up in Zone B every minute:
==> /var/log/glusterfs/geo-replication/xfs_gfs-c1_xfs/gsyncd.log <==
[2023-07-13 06:29:42.456403] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=3}, {num_files=1}, {return_code=23}, {duration=0.1733}]
[2023-07-13 06:29:42.456600] W [primary(worker /mnt/brick/abc):1410:process] <top>: incomplete sync, retrying changelogs [{files=['CHANGELOG.1689229770']}]
[2023-07-13 06:29:43.466236] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=1}, {num_files=1}, {return_code=23}, {duration=0.1731}]
[2023-07-13 06:29:43.466490] E [primary(worker /mnt/brick/abc):1388:process] _GPrimary: changelogs could not be processed completely - moving on... [{files=['CHANGELOG.1689229770']}]
[2023-07-13 06:29:43.478993] I [primary(worker /mnt/brick/abc):1429:process] _GPrimary: Entry Time Taken [{UNL=0}, {RMD=0}, {CRE=0}, {MKN=0}, {MKD=0}, {REN=0}, {LIN=0}, {SYM=0}, {duration=0.0000}]
[2023-07-13 06:29:43.479052] I [primary(worker /mnt/brick/abc):1441:process] _GPrimary: Data/Metadata Time Taken [{SETA=0}, {meta_duration=0.0000}, {SETX=0}, {XATT=0}, {DATA=0}, {data_duration=1689229783.4790}]
[2023-07-13 06:29:43.479136] I [primary(worker /mnt/brick/abc):1451:process] _GPrimary: Batch Completed [{mode=live_changelog}, {duration=9.9611}, {changelog_start=1689229770}, {changelog_end=1689229770}, {num_changelogs=1}, {stime=(1689229769, 0)}, {entry_stime=(1689212054, 0)}]
And the directory /mnt/brick/abc/.glusterfs/changelogs/2023/07/13 will have new CHANGELOG file
The exact command to reproduce the issue:
Use hexedit to modify file.
The full output of the command that failed:
Expected results: Cascading geo-replication should work even small change in source volume
Mandatory info:
- The output of the gluster volume info command:
# Zone A
Volume Name: xfs
Type: Replicate
Volume ID: 674c77bc-7332-4067-982c-6c3642fe0be5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs-a1:/mnt/brick/abc
Brick2: gfs-a2:/mnt/brick/abc
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
# Zone B
Volume Name: xfs
Type: Replicate
Volume ID: 5e3737f1-00bf-4328-afc5-584fe0aedc05
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs-b1:/mnt/brick/abc
Brick2: gfs-b2:/mnt/brick/abc
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
features.read-only: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: on
# Zone C
Volume Name: xfs
Type: Distribute
Volume ID: aa6f13bd-b067-4d18-b726-ea02435439f1
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: gfs-c1:/mnt/brick/abc
Options Reconfigured:
cluster.self-heal-daemon: enable
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
- The output of the gluster volume status command:
# Zone A
Status of volume: xfs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gfs-a1:/mnt/brick/abc 59424 0 Y 2863494
Brick gfs-a2:/mnt/brick/abc 53895 0 Y 2433056
Self-heal Daemon on localhost N/A N/A Y 2843715
Self-heal Daemon on gfs-a2 N/A N/A Y 481613
Self-heal Daemon on gfs-a3 N/A N/A Y 1635
Task Status of Volume xfs
------------------------------------------------------------------------------
There are no active volume tasks
# Zone B
Status of volume: xfs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gfs-b1:/mnt/brick/abc 58327 0 Y 3316172
Brick gfs-b2:/mnt/brick/abc 50112 0 Y 3817308
Self-heal Daemon on localhost N/A N/A Y 3820796
Self-heal Daemon on gfs-b1 N/A N/A Y 3318985
Self-heal Daemon on gfs-b3 N/A N/A Y 51016
Task Status of Volume xfs
------------------------------------------------------------------------------
There are no active volume tasks
# Zone C
Status of volume: xfs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gfs-c1:/mnt/brick/abc 52939 0 Y 800521
Task Status of Volume xfs
------------------------------------------------------------------------------
There are no active volume tasks
- The output of the gluster volume heal command:
# Zone A
Brick gfs-a1:/mnt/brick/abc
Status: Connected
Number of entries: 0
Brick gfs-a2:/mnt/brick/abc
Status: Connected
Number of entries: 0
# Zone B
Brick gfs-b1:/mnt/brick/abc
Status: Connected
Number of entries: 0
Brick gfs-b2:/mnt/brick/abc
Status: Connected
Number of entries: 0
# Zone C (distributed volum)
This command is supported for only volumes of replicate/disperse type. Volume xfs is not of type replicate/disperse
**- Provide logs present on following locations of client and server nodes -
==> /var/log/glusterfs/geo-replication/xfs_gfs-c1_xfs/gsyncd.log <==
[2023-07-13 04:27:19.947517] I [primary(worker /mnt/brick/abc):1524:crawl] _GPrimary: secondary's time [{stime=(1689222374, 0)}]
==> /var/log/glusterfs/geo-replication/xfs_gfs-c1_xfs/mnt-mnt-brick-abc.log <==
[2023-07-13 04:27:20.201782 +0000] I [MSGID: 108031] [afr-common.c:3201:afr_local_discovery_cbk] 0-xfs-replicate-0: selecting local read_child xfs-client-0
==> /var/log/glusterfs/geo-replication/xfs_gfs-c1_xfs/gsyncd.log <==
[2023-07-13 04:27:20.327038] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=3}, {num_files=1}, {return_code=23}, {duration=0.1541}]
[2023-07-13 04:27:20.327319] W [primary(worker /mnt/brick/abc):1410:process] <top>: incomplete sync, retrying changelogs [{files=['CHANGELOG.1689222435']}]
[2023-07-13 04:27:21.406110] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=1}, {num_files=1}, {return_code=23}, {duration=0.1736}]
[2023-07-13 04:27:21.406388] W [primary(worker /mnt/brick/abc):1410:process] <top>: incomplete sync, retrying changelogs [{files=['CHANGELOG.1689222435']}]
[2023-07-13 04:27:22.465287] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=2}, {num_files=1}, {return_code=23}, {duration=0.1745}]
[2023-07-13 04:27:22.465494] W [primary(worker /mnt/brick/abc):1410:process] <top>: incomplete sync, retrying changelogs [{files=['CHANGELOG.1689222435']}]
[2023-07-13 04:27:23.503807] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=3}, {num_files=1}, {return_code=23}, {duration=0.1731}]
[2023-07-13 04:27:23.504027] W [primary(worker /mnt/brick/abc):1410:process] <top>: incomplete sync, retrying changelogs [{files=['CHANGELOG.1689222435']}]
[2023-07-13 04:27:24.584944] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=1}, {num_files=1}, {return_code=23}, {duration=0.1752}]
[2023-07-13 04:27:24.585249] W [primary(worker /mnt/brick/abc):1410:process] <top>: incomplete sync, retrying changelogs [{files=['CHANGELOG.1689222435']}]
[2023-07-13 04:27:25.643880] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=2}, {num_files=1}, {return_code=23}, {duration=0.1752}]
[2023-07-13 04:27:25.644174] W [primary(worker /mnt/brick/abc):1410:process] <top>: incomplete sync, retrying changelogs [{files=['CHANGELOG.1689222435']}]
[2023-07-13 04:27:26.680072] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=3}, {num_files=1}, {return_code=23}, {duration=0.1727}]
[2023-07-13 04:27:26.680316] W [primary(worker /mnt/brick/abc):1410:process] <top>: incomplete sync, retrying changelogs [{files=['CHANGELOG.1689222435']}]
[2023-07-13 04:27:27.762776] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=1}, {num_files=1}, {return_code=23}, {duration=0.1741}]
[2023-07-13 04:27:27.762966] W [primary(worker /mnt/brick/abc):1410:process] <top>: incomplete sync, retrying changelogs [{files=['CHANGELOG.1689222435']}]
[2023-07-13 04:27:28.820977] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=2}, {num_files=1}, {return_code=23}, {duration=0.1735}]
[2023-07-13 04:27:28.821195] W [primary(worker /mnt/brick/abc):1410:process] <top>: incomplete sync, retrying changelogs [{files=['CHANGELOG.1689222435']}]
[2023-07-13 04:27:29.836832] I [primary(worker /mnt/brick/abc):2009:syncjob] Syncer: Sync Time Taken [{job=3}, {num_files=1}, {return_code=23}, {duration=0.1531}]
[2023-07-13 04:27:29.837025] E [primary(worker /mnt/brick/abc):1388:process] _GPrimary: changelogs could not be processed completely - moving on... [{files=['CHANGELOG.1689222435']}]
[2023-07-13 04:27:29.864885] I [primary(worker /mnt/brick/abc):1429:process] _GPrimary: Entry Time Taken [{UNL=0}, {RMD=0}, {CRE=0}, {MKN=0}, {MKD=0}, {REN=0}, {LIN=0}, {SYM=0}, {duration=0.0000}]
[2023-07-13 04:27:29.864932] I [primary(worker /mnt/brick/abc):1441:process] _GPrimary: Data/Metadata Time Taken [{SETA=0}, {meta_duration=0.0000}, {SETX=0}, {XATT=0}, {DATA=0}, {data_duration=1689222449.8649}]
[2023-07-13 04:27:29.865018] I [primary(worker /mnt/brick/abc):1451:process] _GPrimary: Batch Completed [{mode=live_changelog}, {duration=9.9060}, {changelog_start=1689222435}, {changelog_end=1689222435}, {num_changelogs=1}, {stime=(1689222434, 0)}, {entry_stime=(1689212054, 0)}]
[2023-07-13 04:27:34.884063] I [primary(worker /mnt/brick/abc):1524:crawl] _GPrimary: secondary's time [{stime=(1689222434, 0)}]
**- Is there any crash ? Provide the backtrace and coredump No crash happen.
Additional info:
- The operating system / glusterfs version:
Rocky Linux 5.14.0-284.18.1.el9_2.x86_64
glusterfs 10.4
I found a workaround solution here. I suspect this is a bug happen to intermediate zone when performing rsync task in cascading geo-replication setup.
When I update last byte of the file in Zone A, and geo-replication task start:
# Show .processing directory
$ ls /var/lib/misc/gluster/gsyncd/..././.processing
total 4
-rw-r--r-- 1 root root 39 Jul 27 21:42 CHANGELOG.1690465326
# Show content of CHANGLOG file
$ cat CHANGELOG.1690465326
D c8b7999b-78ef-4894-b35a-4b413070194c
This is the log file content showing rsync shall perform update of 1 file (in Number of regular files transferred):
[2023-07-27 13:42:10.323244] D [primary(worker /mnt/brick/b):317:a_syncdata] _GPrimary: files [{files={'.gfid/c8b7999b-78ef-4894-b35a-4b413070194c'}}]
[2023-07-27 13:42:10.323301] D [primary(worker /mnt/brick/b):320:a_syncdata] _GPrimary: candidate for syncing [{file=.gfid/c8b7999b-78ef-4894-b35a-4b413070194c}]
[2023-07-27 13:42:10.353568] D [resource(worker /mnt/brick/b):1442:rsync] SSH: files: .gfid/c8b7999b-78ef-4894-b35a-4b413070194c
[2023-07-27 13:42:39.128166] I [resource(worker /mnt/brick/b):1530:rsync] SSH: rsync verbose [{
data='building file list ... done
.gfid/c8b7999b-78ef-4894-b35a-4b413070194c
Number of files: 2 (reg: 1, dir: 1)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 1
Total file size: 5,842,186,240 bytes
Total transferred file size: 5,842,186,240 bytes
Literal data: 29,888 bytes
Matched data: 5,842,156,352 bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 335,908
Total bytes received: 611,565
sent 335,908 bytes received 611,565 bytes 32,117.73 bytes/sec
total size is 5,842,186,240 speedup is 6,166.07
'}]
[2023-07-27 13:42:39.128545] D [primary(worker /mnt/brick/b):326:regjob] _GPrimary: synced [{file=.gfid/c8b7999b-78ef-4894-b35a-4b413070194c}]
Next, the intermediate node of Zone B:
# Show .processing directory
$ ls /var/lib/misc/gluster/gsyncd/..././.processing
total 4
-rw-r--r-- 1 root root 48 Jul 27 21:50 CHANGELOG.1690465836
# Show content of CHANGLOG file
$ cat CHANGELOG.1690465836
D c8b7999b-78ef-4894-b35a-4b413070194c
M c8b7999b-78ef-4894-b35a-4b413070194c SETATTR
M 00000000-0000-0000-0000-000000000001 SETXATTR
This is the log file content showing rsync does not perform any update (in Number of regular files transferred). But the timestamp of the file do updated.
[{batch=['/var/lib/misc/gluster/gsyncd/b_gfs-c1_b/mnt-brick-b/.processing/CHANGELOG.1690465836']}]
[2023-07-27 13:50:38.544077] D [primary(worker /mnt/brick/b):1343:process] _GPrimary: processing change [{changelog=/var/lib/misc/gluster/gsyncd/b_gfs-c1_b/mnt-brick-b/.processing/CHANGELOG.1690465836}]
[2023-07-27 13:50:38.544209] D [primary(worker /mnt/brick/b):1207:process_change] _GPrimary: entries: []
[2023-07-27 13:50:38.545924] D [repce(worker /mnt/brick/b):195:push] RepceClient: call 2018616:139904444790592:1690465838.5458996 meta_ops([{'op': 'META', 'skip_entry': False, 'go': '.gfid/c8b7999b-78ef-4894-b35a-4b413070194c', 'stat': {'uid': 0, 'gid': 0, 'mode': 33188, 'atime': 1690355048.1178703, 'mtime': 1690465789.6717978}}],) ...
[2023-07-27 13:50:38.547595] D [repce(worker /mnt/brick/b):215:__call__] RepceClient: call 2018616:139904444790592:1690465838.5458996 meta_ops -> []
[2023-07-27 13:50:38.548894] D [primary(worker /mnt/brick/b):317:a_syncdata] _GPrimary: files [{files={'.gfid/c8b7999b-78ef-4894-b35a-4b413070194c'}}]
[2023-07-27 13:50:38.548938] D [primary(worker /mnt/brick/b):320:a_syncdata] _GPrimary: candidate for syncing [{file=.gfid/c8b7999b-78ef-4894-b35a-4b413070194c}]
[2023-07-27 13:50:38.842242] D [resource(worker /mnt/brick/b):1442:rsync] SSH: files: .gfid/c8b7999b-78ef-4894-b35a-4b413070194c
[2023-07-27 13:50:38.910505] I [resource(worker /mnt/brick/b):1530:rsync] SSH: rsync verbose [{
data='building file list ... done
Number of files: 2 (reg: 1, dir: 1)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 0
Total file size: 5,842,186,240 bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 216
Total bytes received: 17
sent 216 bytes received 17 bytes 466.00 bytes/sec
total size is 5,842,186,240 speedup is 25,073,760.69
'}]
[2023-07-27 13:50:38.910670] I [primary(worker /mnt/brick/b):2009:syncjob] Syncer: Sync Time Taken [{job=2}, {num_files=1}, {return_code=0}, {duration=0.0684}]
Here are the summary:
# Zone A CHANGELOG
D c8b7999b-78ef-4894-b35a-4b413070194c
# Zone B CHANGELOG
D c8b7999b-78ef-4894-b35a-4b413070194c
M c8b7999b-78ef-4894-b35a-4b413070194c SETATTR
M 00000000-0000-0000-0000-000000000001 SETXATTR
Zone B file timestamp updated. Content also updated. Zone C file timestamp updated. Content does not update.
This is related to a weakness rsync delta transfer algorithm. Zone C's timestamp has updated due to this change log:
M c8b7999b-78ef-4894-b35a-4b413070194c SETATTR`
before rsync operation:
rsync -v -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls --existing ./ gfs-c1:/mnt/brick/b
My workaround solution is add rsync ignore-times flag (-I) to geo-replication config's rsync_options:
sudo gluster volume geo-replication $VOL $GEO_USER@$H2::$VOL config rsync_options '-I'