glusterfs
glusterfs copied to clipboard
After shard is enabled, the size of the copied file is inconsistent with the original file
After shard is enabled, the size of the copied file is inconsistent with the original file.
The volume configuration is as follows: Volume Name: data Type: Replicate Volume ID: 02c625c8-a097-46fd-b913-76a53f286ff7 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node1:/export/heketi/node_d5071/device_65fa5/data_953e3 Brick2: node3:/export/heketi/node_9d13b/device_dfd0a/data_770ff Brick3: node2:/export/heketi/node_016de/device_56b70/data_762bc Options Reconfigured: performance.write-behind: on diagnostics.brick-log-level: INFO diagnostics.client-log-level: INFO features.shard: on features.shard-block-size: 1024MB user.heketi.id: 85b4bccd7ffd0c6d97658cb5badbe3ae cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off client.event-threads: 1
Mount volume data on node1: mount -t glusterfs node1:/data /mnt
Generate a data6.img file in the /root path: if=/dev/zero of=/root/data6.img bs=128k count=11
View its md5 value and file size: [root@node1 ~]# md5sum /root/data6.img 2aabc019f6b5d881028999f055f5ff14 /root/data6.img [root@node1 ~]# ls -l /root/data6.img -rw-r--r-- 1 root root 1441792 8Mon 14 14:19 /root/data6.img
Copy the data6.img file to the /mnt folder: cp /root/data6.img /mnt/
Check the md5 and file size of the data6.img file in the /mnt path and find that there is a certain probability that the md5 and file size are inconsistent with the original file: [root@node1 ~]# md5sum /mnt/data6.img b98f319ebcfe36f416c0b7d9281f85ff /mnt/data6.img [root@node1 ~]# ls -l /mnt/data6.img -rw-r--r-- 1 root root 2359296 8Mon 14 14:19 /mnt/data6.img
Through log and gdb tracking, it was found that during the file copying process, when shard_common_inode_write_do_cbk->shard_get_delta_size_from_inode_ctx calculated local->delta_size, the ctx->stat.ia_size value changed significantly from the expected value and became significantly smaller, resulting in the calculated local->delta_size being larger than the actual value to be increased. Further tracking revealed that during the file copying process, the ctx->refresh of the file inode was set to _gf_true with a certain probability, resulting in the triggering of shard_lookup_base_file_cbk->shard_inode_ctx_set when the next write was triggered. It was precisely because of this update that the ctx->stat.ia_size value changed, resulting in an error in the calculation of local->delta_size by shard_get_delta_size_from_inode_ctx.
Why is there a certain probability that ctx->refresh of the file inode is set to _gf_true during file writing? In our usage environment, it is likely related to our upper-layer application frequently reading the contents of the /mnt folder. According to the glusterfs shard_readdirp code, it will set ctx->refresh to _gf_true under certain conditions.
Another interesting thing is that I found that the problem does not seem to occur when performance.write-behind is turned off. I don’t know if there is any connection between the two.