glusterfs
glusterfs copied to clipboard
lseek SEEK_DATA - File descriptor in bad state
Description of problem: lseek SEEK_DATA fails and return the error EBADF
The same lseek command on the file in the brick (instead of the file in the gluster volume) works
The exact command to reproduce the issue:
# dd if=/dev/urandom of=/hyperfile/cache/read/3/1/test bs=1k count=1000
1000+0 records in
1000+0 records out
1024000 bytes (1.0 MB, 1000 KiB) copied, 2.99579 s, 342 kB/s
# ./script.py /hyperfile/cache/read/3/1/test
Traceback (most recent call last):
File "/hyperfile/cache/read/3/1/./script.py", line 18, in <module>
main()
File "/hyperfile/cache/read/3/1/./script.py", line 14, in main
print(os.lseek(input_file.fileno(), offset, os.SEEK_DATA))
OSError: [Errno 77] File descriptor in bad state
The issue and python script is the same as the one in https://github.com/gluster/glusterfs/issues/894 Setting performance.open-behind to off doesn't fix the issue.
#!/usr/bin/env python3
import argparse
import os
def main():
parser = argparse.ArgumentParser()
parser.add_argument('input_file')
args = parser.parse_args()
offset = 0
with open(args.input_file, 'rb', 0) as input_file:
print(os.lseek(input_file.fileno(), offset, os.SEEK_DATA))
if __name__ == '__main__':
main()
The full output of the command that failed: OSError: [Errno 77] File descriptor in bad state
Expected results: lseek SEEK_DATA should adjust the file offset to the next location in the file greater than or equal to offset containing data
Mandatory info:
- The output of the gluster volume info
command:
Volume Name: read-cache Type: Distributed-Replicate Volume ID: 80db4cce-2c4b-4233-9c9b-bb77b97cbd61 Status: Started Snapshot Count: 0 Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: node1:/brick_sdd Brick2: node2:/brick_sdd Brick3: node3:/brick_sdd Brick4: node1:/brick_sde Brick5: node2:/brick_sde Brick6: node3:/brick_sde Options Reconfigured: diagnostics.client-sys-log-level: INFO diagnostics.client-log-level: DEBUG performance.open-behind: off features.quota-deem-statfs: on features.inode-quota: on features.quota: on performance.write-behind: off cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off
- The output of the gluster volume status
command:
Status of volume: read-cache Gluster process TCP Port RDMA Port Online Pid
Brick node1:/brick_sdd 49154 0 Y 1075721 Brick node2:/brick_sdd 49154 0 Y 921736 Brick node3:/brick_sdd 49154 0 Y 2644935 Brick node1:/brick_sde 49155 0 Y 1075741 Brick node2:/brick_sde 49155 0 Y 921773 Brick node3:/brick_sde 49155 0 Y 2644951 Self-heal Daemon on localhost N/A N/A Y 1075587 Quota Daemon on localhost N/A N/A Y 1075837 Self-heal Daemon on node2 N/A N/A Y 921655 Quota Daemon on node2 N/A N/A Y 921829 Self-heal Daemon on node3 N/A N/A Y 2644849 Quota Daemon on node3 N/A N/A Y 2645032
Task Status of Volume read-cache
There are no active volume tasks
**- Provide logs present on following locations of client and server nodes -
In the mount client logs we can see the following errors (I have set log level to debug):
[2022-09-05 13:56:30.717260 +0000] D [MSGID: 0] [afr-inode-read.c:1824:afr_seek_wind] 0-stack-trace: stack-address: 0x7f042801e1c8, read-ferrorcache-replicate-0 returned -1 error: File descriptor in bad state [File descriptor in bad state] [2022-09-05 13:56:30.717273 +0000] D [MSGID: 0] [io-stats.c:2519:io_stats_seek_cbk] 0-stack-trace: stack-address: 0x7f042801e1c8, read-cache returned -1 error: File descriptor in bad state [File descriptor in bad state] [2022-09-05 13:56:30.717824 +0000] D [MSGID: 114031] [client-rpc-fops_v2.c:279:client4_0_open_cbk] 0-read-cache-client-1: remote operation failed. [{path=/test}, {gfid=1a1ec7b9-e34d-4caf-87df-4aba3d9a85e7}, {errno=116}, {error=Stale file handle}] [2022-09-05 13:56:30.717846 +0000] D [MSGID: 0] [client-rpc-fops_v2.c:283:client4_0_open_cbk] 0-stack-trace: stack-address: 0x7f0428008068, read-cache-client-1 returned -1 error: Stale file handle [Stale file handle] [2022-09-05 13:56:30.717865 +0000] D [MSGID: 108009] [afr-open.c:220:afr_openfd_fix_open_cbk] 0-read-cache-replicate-0: Failed to open subvolume [{path=/test}, {subvolume=read-cache-client-1}, {errno=116}, {error=Stale file handle}] [2022-09-05 13:56:30.718006 +0000] D [MSGID: 114031] [client-rpc-fops_v2.c:279:client4_0_open_cbk] 0-read-cache-client-0: remote operation failed. [{path=/test}, {gfid=1a1ec7b9-e34d-4caf-87df-4aba3d9a85e7}, {errno=116}, {error=Stale file handle}] [2022-09-05 13:56:30.718023 +0000] D [MSGID: 0] [client-rpc-fops_v2.c:283:client4_0_open_cbk] 0-stack-trace: stack-address: 0x7f0428008068, read-cache-client-0 returned -1 error: Stale file handle [Stale file handle] [2022-09-05 13:56:30.718040 +0000] D [MSGID: 108009] [afr-open.c:220:afr_openfd_fix_open_cbk] 0-read-cache-replicate-0: Failed to open subvolume [{path=/test}, {subvolume=read-cache-client-0}, {errno=116}, {error=Stale file handle}]
- The operating system / glusterfs version: glusterfs 9.4 on Rocky linux 8.6
One more really odd behavior of gluster related to seek_data (see below). This issue appears suddenly, after that any file that is renamed in "test3" fails the seek_data
# mount -v | grep mnt
10.130.42.84:/read-cache on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
# ls /mnt
info sparse stat.db subvol test
# cp /mnt/test /mnt/test3
# /root/script.py /mnt/test
0
# /root/script.py /mnt/test3
Traceback (most recent call last):
File "/root/script.py", line 19, in <module>
main()
File "/root/script.py", line 14, in main
print(os.lseek(input_file.fileno(), offset, os.SEEK_DATA))
OSError: [Errno 77] File descriptor in bad state
@handrea2009 - so it may or may not fail, based on the file name?
I can't say for sure, but currently on my test system the seek_data fails on any file named test3. I have also recreated the gluster volume (stop, delete, create, start) and still seek_data on file name test3 fails. Fails also on file named test30, but it works on file named test43
I can't say for sure, but currently on my test system the seek_data fails on any file named test3. I have also recreated the gluster volume (stop, delete, create, start) and still seek_data on file name test3 fails. Fails also on file named test30, but it works on file named test43
Yes, so I think we might be seeing the same issue - the SEEK FOP for some reason going to the wrong brick.
And as @xhernandez pointed it - looks like https://github.com/gluster/glusterfs/issues/3373 .
And as @xhernandez pointed it - looks like #3373 .
Indeed my logs seems top confirm it:
[2022-09-06 13:06:47.928358 +0000] T [MSGID: 0] [utime-autogen-fops.c:140:gf_utime_open] 0-stack-trace: stack-address: 0x7f8e6c01d638, winding from read-cache-utime to read-cache-dht
[2022-09-06 13:06:47.928366 +0000] T [MSGID: 0] [dht-inode-read.c:135:dht_open] 0-stack-trace: stack-address: 0x7f8e6c01d638, winding from read-cache-dht to read-cache-replicate-1
[2022-09-06 13:06:47.928942 +0000] T [MSGID: 0] [dht-inode-read.c:65:dht_open_cbk] 0-stack-trace: stack-address: 0x7f8e6c01d638, read-cache-dht returned 0
[2022-09-06 13:06:47.929087 +0000] T [MSGID: 0] [defaults.c:3122:default_seek] 0-stack-trace: stack-address: 0x7f8e6c0027b8, winding from read-cache-utime to read-cache-dh
[2022-09-06 13:06:47.929095 +0000] T [MSGID: 0] [defaults.c:3122:default_seek] 0-stack-trace: stack-address: 0x7f8e6c0027b8, winding from read-cache-dht to read-cache-replicate-0
My understanding is that the issue appears when you have more subvolume, so ti shoudln't be present when you have only one brick per volume. If it so I could use only one brick per volume as a work-around
My understanding is that the issue appears when you have more subvolume, so ti shoudln't be present when you have only one brick per volume. If it so I could use only one brick per volume as a work-around
With just a single DHT subvolume (i.e. a pure replicated volume, not a distributed-replicated one), this problem shouldn't happen.
Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.