Stale file handle during rebalancing
Description of problem:
In the distributed replication volume mode, we recently added more bricks and performed the rebalance on the server. When reading the file from the mount, we found some errors like Stale file handle, which causes the glusterfs to some extent unusable.
We suspect there exists inconsistency between the mount client and the server. If we unmount and remount the volume, the error disappeared.
Our question is
- Is that normal during rebalancing?
- Is this issue specific to our glusterfs version?
- Are there any methods to avoid that issue?
- Re-mounting seems to fix it in the short time. As the rebalancing is still running in the background, will this issue appear again?
The exact command to reproduce the issue:
username@host /home/username$ ls /mnt/gfs/path/to/file
ls: cannot access /mnt/gfs/path/to/file: Stale file handle
The full output of the command that failed:
Expected results:
Mandatory info:
- The output of the gluster volume info command:
Volume Name: volumename
Type: Distributed-Replicate
Volume ID: 2f45e6a4-7ff5-4138-ab59-18de97a2d39a
Status: Started
Snapshot Count: 0
Number of Bricks: 22 x 3 = 66
Transport-type: tcp
Bricks:
Brick1: hostname201:/mnt/volumename/01/brick
Brick2: hostname209:/mnt/volumename/01/brick
Brick3: hostname206:/mnt/volumename/01/brick
Brick4: hostname201:/mnt/volumename/02/brick
Brick5: hostname209:/mnt/volumename/02/brick
Brick6: hostname206:/mnt/volumename/02/brick
Brick7: hostname201:/mnt/volumename/03/brick
Brick8: hostname209:/mnt/volumename/03/brick
Brick9: hostname206:/mnt/volumename/03/brick
Brick10: hostname201:/mnt/volumename/04/brick
Brick11: hostname209:/mnt/volumename/04/brick
Brick12: hostname206:/mnt/volumename/04/brick
Brick13: hostname201:/mnt/volumename/05/brick
Brick14: hostname209:/mnt/volumename/05/brick
Brick15: hostname206:/mnt/volumename/05/brick
Brick16: hostname201:/mnt/volumename/06/brick
Brick17: hostname209:/mnt/volumename/06/brick
Brick18: hostname206:/mnt/volumename/06/brick
Brick19: hostname201:/mnt/volumename/07/brick
Brick20: hostname209:/mnt/volumename/07/brick
Brick21: hostname206:/mnt/volumename/07/brick
Brick22: hostname201:/mnt/volumename/08/brick
Brick23: hostname209:/mnt/volumename/08/brick
Brick24: hostname206:/mnt/volumename/08/brick
Brick25: hostname201:/mnt/volumename/09/brick
Brick26: hostname209:/mnt/volumename/09/brick
Brick27: hostname206:/mnt/volumename/09/brick
Brick28: hostname201:/mnt/volumename/10/brick
Brick29: hostname209:/mnt/volumename/10/brick
Brick30: hostname206:/mnt/volumename/10/brick
Brick31: hostname204:/mnt/volumename/01/brick
Brick32: hostname205:/mnt/volumename/01/brick
Brick33: hostname207:/mnt/volumename/01/brick
Brick34: hostname204:/mnt/volumename/02/brick
Brick35: hostname205:/mnt/volumename/02/brick
Brick36: hostname207:/mnt/volumename/02/brick
Brick37: hostname204:/mnt/volumename/03/brick
Brick38: hostname205:/mnt/volumename/03/brick
Brick39: hostname207:/mnt/volumename/03/brick
Brick40: hostname204:/mnt/volumename/04/brick
Brick41: hostname205:/mnt/volumename/04/brick
Brick42: hostname207:/mnt/volumename/04/brick
Brick43: hostname204:/mnt/volumename/05/brick
Brick44: hostname205:/mnt/volumename/05/brick
Brick45: hostname207:/mnt/volumename/05/brick
Brick46: hostname204:/mnt/volumename/06/brick
Brick47: hostname205:/mnt/volumename/06/brick
Brick48: hostname207:/mnt/volumename/06/brick
Brick49: hostname204:/mnt/volumename/07/brick
Brick50: hostname205:/mnt/volumename/07/brick
Brick51: hostname207:/mnt/volumename/07/brick
Brick52: hostname204:/mnt/volumename/08/brick
Brick53: hostname205:/mnt/volumename/08/brick
Brick54: hostname207:/mnt/volumename/08/brick
Brick55: hostname204:/mnt/volumename/09/brick
Brick56: hostname205:/mnt/volumename/09/brick
Brick57: hostname207:/mnt/volumename/09/brick
Brick58: hostname204:/mnt/volumename/10/brick
Brick59: hostname205:/mnt/volumename/10/brick
Brick60: hostname207:/mnt/volumename/10/brick
Brick61: hostname204:/mnt/volumename/11/brick
Brick62: hostname205:/mnt/volumename/11/brick
Brick63: hostname207:/mnt/volumename/11/brick
Brick64: hostname204:/mnt/volumename/12/brick
Brick65: hostname205:/mnt/volumename/12/brick
Brick66: hostname207:/mnt/volumename/12/brick
Options Reconfigured:
cluster.rebal-throttle: lazy
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
performance.parallel-readdir: on
client.event-threads: 5
server.event-threads: 5
performance.io-thread-count: 16
performance.open-behind: off
- The output of the gluster volume status command:
Status of volume: volumename
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick hostname201:/mnt/volumename/01/brick 49154 0 Y 33014
Brick hostname209:/mnt/volumename/01/brick 49154 0 Y 34464
Brick hostname206:/mnt/volumename/01/brick 49154 0 Y 31296
Brick hostname201:/mnt/volumename/02/brick 49155 0 Y 33027
Brick hostname209:/mnt/volumename/02/brick 49155 0 Y 34477
Brick hostname206:/mnt/volumename/02/brick 49155 0 Y 31308
Brick hostname201:/mnt/volumename/03/brick 49156 0 Y 33039
Brick hostname209:/mnt/volumename/03/brick 49156 0 Y 34492
Brick hostname206:/mnt/volumename/03/brick 49156 0 Y 31319
Brick hostname201:/mnt/volumename/04/brick 49157 0 Y 33053
Brick hostname209:/mnt/volumename/04/brick 49157 0 Y 34506
Brick hostname206:/mnt/volumename/04/brick 49157 0 Y 31355
Brick hostname201:/mnt/volumename/05/brick 49158 0 Y 33068
Brick hostname209:/mnt/volumename/05/brick 49158 0 Y 34519
Brick hostname206:/mnt/volumename/05/brick 49158 0 Y 31359
Brick hostname201:/mnt/volumename/06/brick 49159 0 Y 33082
Brick hostname209:/mnt/volumename/06/brick 49159 0 Y 34535
Brick hostname206:/mnt/volumename/06/brick 49159 0 Y 31384
Brick hostname201:/mnt/volumename/07/brick 49160 0 Y 33095
Brick hostname209:/mnt/volumename/07/brick 49160 0 Y 34550
Brick hostname206:/mnt/volumename/07/brick 49160 0 Y 31401
Brick hostname201:/mnt/volumename/08/brick 49161 0 Y 33111
Brick hostname209:/mnt/volumename/08/brick 49161 0 Y 34562
Brick hostname206:/mnt/volumename/08/brick 49161 0 Y 31415
Brick hostname201:/mnt/volumename/09/brick 49162 0 Y 33130
Brick hostname209:/mnt/volumename/09/brick 49162 0 Y 34577
Brick hostname206:/mnt/volumename/09/brick 49162 0 Y 31435
Brick hostname201:/mnt/volumename/10/brick 49163 0 Y 33141
Brick hostname209:/mnt/volumename/10/brick 49163 0 Y 34590
Brick hostname206:/mnt/volumename/10/brick 49163 0 Y 31457
Brick hostname204:/mnt/volumename/01/brick 49152 0 Y 66213
Brick hostname205:/mnt/volumename/01/brick 49152 0 Y 242934
Brick hostname207:/mnt/volumename/01/brick 49152 0 Y 63082
Brick hostname204:/mnt/volumename/02/brick 49153 0 Y 66232
Brick hostname205:/mnt/volumename/02/brick 49153 0 Y 242953
Brick hostname207:/mnt/volumename/02/brick 49153 0 Y 63102
Brick hostname204:/mnt/volumename/03/brick 49154 0 Y 66251
Brick hostname205:/mnt/volumename/03/brick 49154 0 Y 242972
Brick hostname207:/mnt/volumename/03/brick 49154 0 Y 63121
Brick hostname204:/mnt/volumename/04/brick 49155 0 Y 66270
Brick hostname205:/mnt/volumename/04/brick 49155 0 Y 242991
Brick hostname207:/mnt/volumename/04/brick 49155 0 Y 63140
Brick hostname204:/mnt/volumename/05/brick 49156 0 Y 66289
Brick hostname205:/mnt/volumename/05/brick 49156 0 Y 243010
Brick hostname207:/mnt/volumename/05/brick 49156 0 Y 63161
Brick hostname204:/mnt/volumename/06/brick 49157 0 Y 66308
Brick hostname205:/mnt/volumename/06/brick 49157 0 Y 243029
Brick hostname207:/mnt/volumename/06/brick 49157 0 Y 63180
Brick hostname204:/mnt/volumename/07/brick 49158 0 Y 66327
Brick hostname205:/mnt/volumename/07/brick 49158 0 Y 243048
Brick hostname207:/mnt/volumename/07/brick 49158 0 Y 63199
Brick hostname204:/mnt/volumename/08/brick 49159 0 Y 66346
Brick hostname205:/mnt/volumename/08/brick 49159 0 Y 243067
Brick hostname207:/mnt/volumename/08/brick 49159 0 Y 63218
Brick hostname204:/mnt/volumename/09/brick 49160 0 Y 66365
Brick hostname205:/mnt/volumename/09/brick 49160 0 Y 243087
Brick hostname207:/mnt/volumename/09/brick 49160 0 Y 63237
Brick hostname204:/mnt/volumename/10/brick 49161 0 Y 66384
Brick hostname205:/mnt/volumename/10/brick 49161 0 Y 243108
Brick hostname207:/mnt/volumename/10/brick 49161 0 Y 63256
Brick hostname204:/mnt/volumename/11/brick 49162 0 Y 66403
Brick hostname205:/mnt/volumename/11/brick 49162 0 Y 243128
Brick hostname207:/mnt/volumename/11/brick 49162 0 Y 63275
Brick hostname204:/mnt/volumename/12/brick 49163 0 Y 66422
Brick hostname205:/mnt/volumename/12/brick 49163 0 Y 243150
Brick hostname207:/mnt/volumename/12/brick 49163 0 Y 63294
Self-heal Daemon on localhost N/A N/A Y 34649
Self-heal Daemon on 10.2.1.241 N/A N/A Y 33192
Self-heal Daemon on hostname207 N/A N/A Y 62991
Self-heal Daemon on hostname204 N/A N/A Y 66125
Self-heal Daemon on hostname206 N/A N/A Y 31613
Self-heal Daemon on hostname205 N/A N/A Y 242842
Task Status of Volume volumename
------------------------------------------------------------------------------
Task : Rebalance
ID : ddf5ecae-c042-48f7-a80f-731bed974c36
Status : in progress
- The output of the gluster volume heal command:
Brick hostname201:/mnt/volumename/01/brick
Status: Connected
Number of entries: 0
Brick hostname209:/mnt/volumename/01/brick
Status: Connected
Number of entries: 0
Brick hostname206:/mnt/volumename/01/brick
Status: Connected
Number of entries: 0
Brick hostname201:/mnt/volumename/02/brick
Status: Connected
Number of entries: 0
Brick hostname209:/mnt/volumename/02/brick
Status: Connected
Number of entries: 0
Brick hostname206:/mnt/volumename/02/brick
Status: Connected
Number of entries: 0
Brick hostname201:/mnt/volumename/03/brick
Status: Connected
Number of entries: 0
Brick hostname209:/mnt/volumename/03/brick
Status: Connected
Number of entries: 0
Brick hostname206:/mnt/volumename/03/brick
Status: Connected
Number of entries: 0
Brick hostname201:/mnt/volumename/04/brick
Status: Connected
Number of entries: 0
Brick hostname209:/mnt/volumename/04/brick
Status: Connected
Number of entries: 0
Brick hostname206:/mnt/volumename/04/brick
Status: Connected
Number of entries: 0
Brick hostname201:/mnt/volumename/05/brick
Status: Connected
Number of entries: 0
Brick hostname209:/mnt/volumename/05/brick
Status: Connected
Number of entries: 0
Brick hostname206:/mnt/volumename/05/brick
Status: Connected
Number of entries: 0
Brick hostname201:/mnt/volumename/06/brick
Status: Connected
Number of entries: 0
Brick hostname209:/mnt/volumename/06/brick
Status: Connected
Number of entries: 0
Brick hostname206:/mnt/volumename/06/brick
Status: Connected
Number of entries: 0
Brick hostname201:/mnt/volumename/07/brick
Status: Connected
Number of entries: 0
Brick hostname209:/mnt/volumename/07/brick
Status: Connected
Number of entries: 0
Brick hostname206:/mnt/volumename/07/brick
Status: Connected
Number of entries: 0
Brick hostname201:/mnt/volumename/08/brick
Status: Connected
Number of entries: 0
Brick hostname209:/mnt/volumename/08/brick
Status: Connected
Number of entries: 0
Brick hostname206:/mnt/volumename/08/brick
Status: Connected
Number of entries: 0
Brick hostname201:/mnt/volumename/09/brick
Status: Connected
Number of entries: 0
Brick hostname209:/mnt/volumename/09/brick
Status: Connected
Number of entries: 0
Brick hostname206:/mnt/volumename/09/brick
Status: Connected
Number of entries: 0
Brick hostname201:/mnt/volumename/10/brick
Status: Connected
Number of entries: 0
Brick hostname209:/mnt/volumename/10/brick
Status: Connected
Number of entries: 0
Brick hostname206:/mnt/volumename/10/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/01/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/01/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/01/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/02/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/02/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/02/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/03/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/03/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/03/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/04/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/04/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/04/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/05/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/05/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/05/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/06/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/06/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/06/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/07/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/07/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/07/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/08/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/08/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/08/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/09/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/09/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/09/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/10/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/10/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/10/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/11/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/11/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/11/brick
Status: Connected
Number of entries: 0
Brick hostname204:/mnt/volumename/12/brick
Status: Connected
Number of entries: 0
Brick hostname205:/mnt/volumename/12/brick
Status: Connected
Number of entries: 0
Brick hostname207:/mnt/volumename/12/brick
Status: Connected
Number of entries: 0
**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/
**- Is there any crash ? Provide the backtrace and coredump No
Additional info:
- The operating system / glusterfs version: CentOS Linux 7 (Core) glusterfs 9.5
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration
@bingzhangdai It is not normal to have a stale file handle on the client. I don't think this is specific to your version. We have to find the RCA to answer your subsequent questions. Could you please provide with more information? I see you are running rebalance, was it a fix-layout? If you can provide more logs from the client and bricks, that will surely help. If you can reproduce this and provide the exact steps, that is even better.