glusterfs Duplicate files created during file rename/modification during rebalance

Description of problem: During a rebalance, some file names being modified (renamed then recreated and edited) end up being duplicated between different brick pairs. That is, the original version of the file and new version both exist, but on different bricks, and show up side by side on the client.

This was discovered while running a script that loops over directories and does the following inside each directory:

mv file.txt file.txt.bak copy some text from "file.txt.bak" to a new "file.txt", using awk and >>

The script errored out because after step 1, "file.txt.bak" did not exist. Instead, there were two (2) copies of "file.txt", the original and a new empty one.

From the client side:

ls -li /path/to/1/file*
10766646257784926732 -rw-rw-r-- 1 user group 91757 May 21 23:26 /path/to/1/file.txt 
11870960232188759349 -rw-rw-r-- 1 user group     0 Jun 21 15:44 /path/to/1/file.txt

Clearly a different date and inode

Investigating the error showed that the original version of file.txt existed on one of the old brick pairs, and the newly created "empty" file.txt existed on one of the new brick pairs.

That is, from the server side:

ls -lh /export/brick*/srv/path/to/1/file*
-rw-rw-r-- 2 uid gid 90K May 21 23:26 /export/brick0/srv/path/to/1/file.txt
-rw-rw-r-- 2 uid gid   0 Jun 21 15:44 /export/brick3/srv/path/to/1/file.txt

This was from one of the directories in question, and the issue occurred in several directories

The exact command to reproduce the issue:

for i in * ; do
  cd $i
  mv file.txt file.txt.bak
  head -1 file.txt.bak > file.txt
  tail -n+2 file.txt.bak | awk  '{... stuff ...}' >> file.txt
  cd ..
done

The full output of the command that failed: head complains that file.txt.bak doesn't exist

Expected results: head and tail should be able read the file

Mandatory info: - The output of the gluster volume info command:

Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 6db232a9-e7a1-46ab-bca4-bc69cf5cb68e
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gluster00:/export/brick0/srv
Brick2: gluster01:/export/brick0/srv
Brick3: gluster00:/export/brick1/srv
Brick4: gluster01:/export/brick1/srv
Brick5: gluster00:/export/brick2/srv
Brick6: gluster01:/export/brick2/srv
Brick7: gluster00:/export/brick3/srv
Brick8: gluster01:/export/brick3/srv
Options Reconfigured:
cluster.rebal-throttle: lazy
server.outstanding-rpc-limit: 512
network.inode-lru-limit: 1000000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
performance.write-behind-window-size: 4MB
performance.io-thread-count: 32
performance.cache-size: 1GB
client.event-threads: 3
server.event-threads: 16
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

- The output of the gluster volume status command:

Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster00:/export/brick0/srv          49152     0          Y       2708 
Brick gluster01:/export/brick0/srv          49152     0          Y       2677 
Brick gluster00:/export/brick1/srv          49153     0          Y       2725 
Brick gluster01:/export/brick1/srv          49153     0          Y       2687 
Brick gluster00:/export/brick2/srv          49154     0          Y       2738 
Brick gluster01:/export/brick2/srv          49154     0          Y       2701 
Brick gluster00:/export/brick3/srv          49155     0          Y       14733
Brick gluster01:/export/brick3/srv          49155     0          Y       19711
Self-heal Daemon on localhost               N/A       N/A        Y       2825 
Self-heal Daemon on gluster01               N/A       N/A        Y       2770 
 
Task Status of Volume gv0
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 20961d7c-c559-48d8-b9a9-f9393ecd7b4e
Status               : in progress

- The output of the gluster volume heal command: gluster volume heal gv0 info

Brick gluster00:/export/brick0/srv
Status: Connected
Number of entries: 0

Brick gluster01:/export/brick0/srv
Status: Connected
Number of entries: 0

Brick gluster00:/export/brick1/srv
Status: Connected
Number of entries: 0

Brick gluster01:/export/brick1/srv
Status: Connected
Number of entries: 0

Brick gluster00:/export/brick2/srv
Status: Connected
Number of entries: 0

Brick gluster01:/export/brick2/srv
Status: Connected
Number of entries: 0

Brick gluster00:/export/brick3/srv
Status: Connected
Number of entries: 0

Brick gluster01:/export/brick3/srv
Status: Connected
Number of entries: 0

gluster volume heal gv0 info split-brain

Brick gluster00:/export/brick0/srv
Status: Connected
Number of entries in split-brain: 0

Brick gluster01:/export/brick0/srv
Status: Connected
Number of entries in split-brain: 0

Brick gluster00:/export/brick1/srv
Status: Connected
Number of entries in split-brain: 0

Brick gluster01:/export/brick1/srv
Status: Connected
Number of entries in split-brain: 0

Brick gluster00:/export/brick2/srv
Status: Connected
Number of entries in split-brain: 0

Brick gluster01:/export/brick2/srv
Status: Connected
Number of entries in split-brain: 0

Brick gluster00:/export/brick3/srv
Status: Connected
Number of entries in split-brain: 0

Brick gluster01:/export/brick3/srv
Status: Connected
Number of entries in split-brain: 0

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/

Server side logs didn't show anything of interest (beyond the ongoing rebalance).

The client side logs showed a few items of interest around the time in question:

[2021-06-21 22:35:21.800323 +0000] I [MSGID: 109064] [dht-rename.c:1955:dht_rename] 0-gv0-dht: renaming /path/to/0/file.txt (55df0b03-43e2-486c-b08e-7b8d4dc183ce) (hash=gv0-replicate-3/cache=gv0-replicate-1) => /path/to/0/file.txt.bak ((null)) (hash=gv0-replicate-0/cache=<nul>)  
[2021-06-21 22:35:22.355723 +0000] W [MSGID: 109034] [dht-rename.c:784:dht_rename_unlink_cbk] 0-gv0-dht: /path/to/0/file.txt: Rename: unlink on gv0-replicate-3 failed  [No such file or directory]
[2021-06-21 22:39:08.063494 +0000] I [MSGID: 109064] [dht-rename.c:1955:dht_rename] 0-gv0-dht: renaming /path/to/0/file.txt.bak (55df0b03-43e2-486c-b08e-7b8d4dc183ce) (hash=gv0-replicate-0/cache=gv0-replicate-1) => /path/to/0/file.txt (56450d24-6e8e-49c9-b620-e0cc0da02ba0) (hash=gv0-replicate-3/cache=gv0-replicate-3)  
[2021-06-21 22:39:22.858007 +0000] I [MSGID: 109064] [dht-rename.c:1955:dht_rename] 0-gv0-dht: renaming /path/to/0/file.txt (55df0b03-43e2-486c-b08e-7b8d4dc183ce) (hash=gv0-replicate-3/cache=gv0-replicate-1) => /path/to/0/file.txt.bak ((null)) (hash=gv0-replicate-0/cache=<nul>)  
[2021-06-21 22:41:21.950443 +0000] I [MSGID: 109064] [dht-rename.c:1955:dht_rename] 0-gv0-dht: renaming /path/to/0/file.txt.bak (55df0b03-43e2-486c-b08e-7b8d4dc183ce) (hash=gv0-replicate-0/cache=gv0-replicate-1) => /path/to/0/file.txt (d7f0efa3-a47d-4f6a-b4a2-b507afed8ffd) (hash=gv0-replicate-3/cache=gv0-replicate-3)  
[2021-06-21 22:44:21.293789 +0000] I [MSGID: 109064] [dht-rename.c:1955:dht_rename] 0-gv0-dht: renaming /path/to/0/file.txt (55df0b03-43e2-486c-b08e-7b8d4dc183ce) (hash=gv0-replicate-3/cache=gv0-replicate-1) => /path/to/0/file.txt.bak ((null)) (hash=gv0-replicate-0/cache=<nul>)  
[2021-06-21 22:44:22.309415 +0000] I [MSGID: 109064] [dht-rename.c:1955:dht_rename] 0-gv0-dht: renaming /path/to/102/file.txt (db791bc8-a658-49a2-b424-6428a5e41471) (hash=gv0-replicate-1/cache=gv0-replicate-1) => /path/to/102/file.txt.bak ((null)) (hash=gv0-replicate-0/cache=<nul>)

Interestingly, the issue is more widespread than the logs account for.

**- Is there any crash ? Provide the backtrace and coredump No crash

Additional info: Client mount options (from /etc/fstab)

gluster:gv0/path /path glusterfs _netdev,defaults,log-level=WARNING,reader-thread-count=4 0 0

- The operating system / glusterfs version: Server: CentOS Linux release 7.9.2009 (Core) kernel: 3.10.0-1160.25.1.el7.x86_64

gluster packages:

# rpm -qa | grep gluster
centos-release-gluster9-1.0-1.el7.noarch
glusterfs-9.2-1.el7.x86_64
glusterfs-cli-9.2-1.el7.x86_64
glusterfs-client-xlators-9.2-1.el7.x86_64
glusterfs-fuse-9.2-1.el7.x86_64
glusterfs-server-9.2-1.el7.x86_64
libglusterd0-9.2-1.el7.x86_64
libglusterfs0-9.2-1.el7.x86_64
nfs-ganesha-gluster-3.5-1.el7.x86_64

Client: Ubuntu 16.04.5 LTS kernel: 4.15.0-123-generic

gluster packages:

# dpkg -l | grep gluster
ii  glusterfs-client                                9.2-ubuntu1~xenial1                             amd64        clustered file-system (client package)
ii  glusterfs-common                                9.2-ubuntu1~xenial1                             amd64        GlusterFS common libraries and translator modules

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

Jun 23 '21 19:06 irischad

I should also add that the directories included in log snippet above were not affected by this behavior. That is, the file move -> create -> modification functioned correctly in the directories listed on those dirs (0 and 102).

There appear to be no log entries listing any activity in the directories where the failures occurred.

Jun 23 '21 23:06 irischad

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

Jan 21 '22 04:01 stale[bot]

I wanted to follow up here that this issue resolved itself after the rebalance completed, but since we have an extremely large data set, this took some time to complete.

Jan 21 '22 16:01 irischad

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

Sep 21 '22 00:09 stale[bot]

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

Oct 22 '22 18:10 stale[bot]