glusterfs lseek SEEK_DATA - File descriptor in bad state

Description of problem: lseek SEEK_DATA fails and return the error EBADF

The same lseek command on the file in the brick (instead of the file in the gluster volume) works

The exact command to reproduce the issue:

# dd if=/dev/urandom of=/hyperfile/cache/read/3/1/test bs=1k count=1000
1000+0 records in
1000+0 records out
1024000 bytes (1.0 MB, 1000 KiB) copied, 2.99579 s, 342 kB/s

# ./script.py /hyperfile/cache/read/3/1/test
Traceback (most recent call last):
  File "/hyperfile/cache/read/3/1/./script.py", line 18, in <module>
    main()
  File "/hyperfile/cache/read/3/1/./script.py", line 14, in main
    print(os.lseek(input_file.fileno(), offset, os.SEEK_DATA))
OSError: [Errno 77] File descriptor in bad state

The issue and python script is the same as the one in https://github.com/gluster/glusterfs/issues/894 Setting performance.open-behind to off doesn't fix the issue.

#!/usr/bin/env python3

import argparse
import os

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('input_file')
    args = parser.parse_args()

    offset = 0

    with open(args.input_file, 'rb', 0) as input_file:
        print(os.lseek(input_file.fileno(), offset, os.SEEK_DATA))


if __name__ == '__main__':
    main()

The full output of the command that failed: OSError: [Errno 77] File descriptor in bad state

Expected results: lseek SEEK_DATA should adjust the file offset to the next location in the file greater than or equal to offset containing data

Mandatory info: - The output of the gluster volume info command:

Volume Name: read-cache Type: Distributed-Replicate Volume ID: 80db4cce-2c4b-4233-9c9b-bb77b97cbd61 Status: Started Snapshot Count: 0 Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: node1:/brick_sdd Brick2: node2:/brick_sdd Brick3: node3:/brick_sdd Brick4: node1:/brick_sde Brick5: node2:/brick_sde Brick6: node3:/brick_sde Options Reconfigured: diagnostics.client-sys-log-level: INFO diagnostics.client-log-level: DEBUG performance.open-behind: off features.quota-deem-statfs: on features.inode-quota: on features.quota: on performance.write-behind: off cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off

- The output of the gluster volume status command:

Status of volume: read-cache Gluster process TCP Port RDMA Port Online Pid

Brick node1:/brick_sdd 49154 0 Y 1075721 Brick node2:/brick_sdd 49154 0 Y 921736 Brick node3:/brick_sdd 49154 0 Y 2644935 Brick node1:/brick_sde 49155 0 Y 1075741 Brick node2:/brick_sde 49155 0 Y 921773 Brick node3:/brick_sde 49155 0 Y 2644951 Self-heal Daemon on localhost N/A N/A Y 1075587 Quota Daemon on localhost N/A N/A Y 1075837 Self-heal Daemon on node2 N/A N/A Y 921655 Quota Daemon on node2 N/A N/A Y 921829 Self-heal Daemon on node3 N/A N/A Y 2644849 Quota Daemon on node3 N/A N/A Y 2645032

Task Status of Volume read-cache

There are no active volume tasks

**- Provide logs present on following locations of client and server nodes -

In the mount client logs we can see the following errors (I have set log level to debug):

[2022-09-05 13:56:30.717260 +0000] D [MSGID: 0] [afr-inode-read.c:1824:afr_seek_wind] 0-stack-trace: stack-address: 0x7f042801e1c8, read-ferrorcache-replicate-0 returned -1 error: File descriptor in bad state [File descriptor in bad state] [2022-09-05 13:56:30.717273 +0000] D [MSGID: 0] [io-stats.c:2519:io_stats_seek_cbk] 0-stack-trace: stack-address: 0x7f042801e1c8, read-cache returned -1 error: File descriptor in bad state [File descriptor in bad state] [2022-09-05 13:56:30.717824 +0000] D [MSGID: 114031] [client-rpc-fops_v2.c:279:client4_0_open_cbk] 0-read-cache-client-1: remote operation failed. [{path=/test}, {gfid=1a1ec7b9-e34d-4caf-87df-4aba3d9a85e7}, {errno=116}, {error=Stale file handle}] [2022-09-05 13:56:30.717846 +0000] D [MSGID: 0] [client-rpc-fops_v2.c:283:client4_0_open_cbk] 0-stack-trace: stack-address: 0x7f0428008068, read-cache-client-1 returned -1 error: Stale file handle [Stale file handle] [2022-09-05 13:56:30.717865 +0000] D [MSGID: 108009] [afr-open.c:220:afr_openfd_fix_open_cbk] 0-read-cache-replicate-0: Failed to open subvolume [{path=/test}, {subvolume=read-cache-client-1}, {errno=116}, {error=Stale file handle}] [2022-09-05 13:56:30.718006 +0000] D [MSGID: 114031] [client-rpc-fops_v2.c:279:client4_0_open_cbk] 0-read-cache-client-0: remote operation failed. [{path=/test}, {gfid=1a1ec7b9-e34d-4caf-87df-4aba3d9a85e7}, {errno=116}, {error=Stale file handle}] [2022-09-05 13:56:30.718023 +0000] D [MSGID: 0] [client-rpc-fops_v2.c:283:client4_0_open_cbk] 0-stack-trace: stack-address: 0x7f0428008068, read-cache-client-0 returned -1 error: Stale file handle [Stale file handle] [2022-09-05 13:56:30.718040 +0000] D [MSGID: 108009] [afr-open.c:220:afr_openfd_fix_open_cbk] 0-read-cache-replicate-0: Failed to open subvolume [{path=/test}, {subvolume=read-cache-client-0}, {errno=116}, {error=Stale file handle}]

- The operating system / glusterfs version: glusterfs 9.4 on Rocky linux 8.6

Sep 05 '22 14:09 handrea2009

One more really odd behavior of gluster related to seek_data (see below). This issue appears suddenly, after that any file that is renamed in "test3" fails the seek_data

# mount -v | grep mnt
10.130.42.84:/read-cache on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

# ls /mnt
info  sparse  stat.db  subvol  test

# cp /mnt/test /mnt/test3

# /root/script.py /mnt/test
0

# /root/script.py /mnt/test3
Traceback (most recent call last):
  File "/root/script.py", line 19, in <module>
    main()
  File "/root/script.py", line 14, in main
    print(os.lseek(input_file.fileno(), offset, os.SEEK_DATA))
OSError: [Errno 77] File descriptor in bad state

Sep 06 '22 08:09 handrea2009

@handrea2009 - so it may or may not fail, based on the file name?

Sep 06 '22 13:09 mykaul

I can't say for sure, but currently on my test system the seek_data fails on any file named test3. I have also recreated the gluster volume (stop, delete, create, start) and still seek_data on file name test3 fails. Fails also on file named test30, but it works on file named test43

Sep 06 '22 13:09 handrea2009

I can't say for sure, but currently on my test system the seek_data fails on any file named test3. I have also recreated the gluster volume (stop, delete, create, start) and still seek_data on file name test3 fails. Fails also on file named test30, but it works on file named test43

Yes, so I think we might be seeing the same issue - the SEEK FOP for some reason going to the wrong brick.

Sep 06 '22 13:09 mykaul

And as @xhernandez pointed it - looks like https://github.com/gluster/glusterfs/issues/3373 .

Sep 06 '22 13:09 mykaul

And as @xhernandez pointed it - looks like #3373 .

Indeed my logs seems top confirm it:

[2022-09-06 13:06:47.928358 +0000] T [MSGID: 0] [utime-autogen-fops.c:140:gf_utime_open] 0-stack-trace: stack-address: 0x7f8e6c01d638, winding from read-cache-utime to read-cache-dht
[2022-09-06 13:06:47.928366 +0000] T [MSGID: 0] [dht-inode-read.c:135:dht_open] 0-stack-trace: stack-address: 0x7f8e6c01d638, winding from read-cache-dht to read-cache-replicate-1
[2022-09-06 13:06:47.928942 +0000] T [MSGID: 0] [dht-inode-read.c:65:dht_open_cbk] 0-stack-trace: stack-address: 0x7f8e6c01d638, read-cache-dht returned 0
[2022-09-06 13:06:47.929087 +0000] T [MSGID: 0] [defaults.c:3122:default_seek] 0-stack-trace: stack-address: 0x7f8e6c0027b8, winding from read-cache-utime to read-cache-dh 
[2022-09-06 13:06:47.929095 +0000] T [MSGID: 0] [defaults.c:3122:default_seek] 0-stack-trace: stack-address: 0x7f8e6c0027b8, winding from read-cache-dht to read-cache-replicate-0

Sep 06 '22 14:09 handrea2009

My understanding is that the issue appears when you have more subvolume, so ti shoudln't be present when you have only one brick per volume. If it so I could use only one brick per volume as a work-around

Sep 06 '22 14:09 handrea2009

My understanding is that the issue appears when you have more subvolume, so ti shoudln't be present when you have only one brick per volume. If it so I could use only one brick per volume as a work-around

With just a single DHT subvolume (i.e. a pure replicated volume, not a distributed-replicated one), this problem shouldn't happen.

Sep 08 '22 10:09 xhernandez

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

May 21 '23 18:05 stale[bot]

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

Jun 11 '23 09:06 stale[bot]

glusterfs glusterfs copied to clipboard

lseek SEEK_DATA - File descriptor in bad state

Status of volume: read-cache Gluster process TCP Port RDMA Port Online Pid

Task Status of Volume read-cache

glusterfs
glusterfs copied to clipboard