glusterfs
glusterfs copied to clipboard
[bug:1724043] [geo-rep]: Checksum mismatch when 2x2 vols are converted to arbiter
URL: https://bugzilla.redhat.com/1724043 Creator: bugzilla-bot at gluster.org Time: 20190626T06:58:13
REVIEW: https://review.gluster.org/22945 ([WIP] Georep: make passive worker sync self-heal traffic) posted (#1) for review on master by hari gowtham
Time: 20190626T07:01:19 hgowtham at redhat commented: Description of problem:
While converting 2x2 to 2x(2+1) (arbiter), there was a checksum mismatch:
[root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/master/
Entry counts Regular files : 10000 Directories : 2011 Symbolic links : 11900 Other : 0 Total : 23911
Metadata checksums Regular files : 5ce564791c Directories : 288ecb21ce24 Symbolic links : 3e9 Other : 3e9
Checksums Regular files : 8e69e8576625d36f9ee1866c92bfb6a3 Directories : 4a596e7e1e792061 Symbolic links : 756e690d61497f6a Other : 0 Total : 2fbf69488baa3ac7
[root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/slave/
Entry counts Regular files : 10000 Directories : 2011 Symbolic links : 11900 Other : 0 Total : 23911
Metadata checksums Regular files : 5ce564791c Directories : 288ecb21ce24 Symbolic links : 3e9 Other : 3e9
Checksums Regular files : 53c64bd1144f6d9855f0af3edb55e614 Directories : 4a596e7e1e792061 Symbolic links : 756e690d61497f6a Other : 0 Total : 3901e39cb02ad487
Everything matches except under "CHECKSUMS", Regular files and the total are a mismatch.
Version-Release number of selected component (if applicable):
glusterfs-3.12.2-45.el7rhgs.x86_64
How reproducible:
2/2
Steps to Reproduce:
- Create and start a geo-rep session with master and slave being 2x2
- Mount the vols and start pumping data
- Disable and stop self healing (prior to add-brick)
gluster volume set VOLNAME cluster.data-self-heal off
gluster volume set VOLNAME cluster.metadata-self-heal off
gluster volume set VOLNAME cluster.entry-self-heal off
gluster volume set VOLNAME self-heal-daemon off
-
Add brick to the master and slave to convert them to 2x(2+1) arbiter vols
-
Start rebalance on master and slave
-
Re-enable self healing :
gluster volume set VOLNAME cluster.data-self-heal on
gluster volume set VOLNAME cluster.metadata-self-heal on
gluster volume set VOLNAME cluster.entry-self-heal on
gluster volume set VOLNAME self-heal-daemon on
- Wait for rebalance to complete
- Check the checksum between master and slave
Actual results:
Checksum does not fully match
Expected results:
Checksum should match
Time: 20200204T09:16:45 sunkumar at redhat commented: *** Bug 1686568 has been marked as a duplicate of this bug. ***
A patch https://review.gluster.org/22945 has been posted that references this issue.
Georep: make passive worker sync self-heal traffic
Problem: When the worker corresponding to a brick which was down becomes Active after the brick comes up, there is a race window where the I/O which happened when the brick was down would not sync to slave. This can result in data loss at slave.
Fix: As the lost fop comes as self heal traffic, we will make the passive workers to sync self heal traffic. This way, we ensure the brick which turned passive from active has received its fops.
Steps:
- make passive worker sync MKNOD
- the time for this sync is updated as stime. When made into an active worker we use this stime to resume sync.
- make clusterstime as a new xattr and store it.
- to avoid passive's stime crossing active stime, check the cluster stime and then proceed.
Change-Id: I96a703744cb211df03c69152594c812b1af87635 fixes: #1049 Signed-off-by: Hari Gowtham [email protected]
Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.
Keep-alive ping..
Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.
Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.