glusterfs icon indicating copy to clipboard operation
glusterfs copied to clipboard

Glusterfs v10.4 'No space left on device' yet we have plenty of space all nodes

Open brandonshoemakerTH opened this issue 1 year ago • 38 comments

Description of problem: We are seeing 'error=No space left on device' issue on Glusterfs 10.4 on AlmaLinux 8 (4.18.0-425.19.2.el8_7.x86_64) even though we have currently 61 TB available on the volume and each of the 12 nodes have 2-8 TB free so we are nowhere near out of space on any node.

#example log msg from /var/log/glusterfs/home-volbackups.log [2023-05-06 23:47:38.645324 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:670:client4_0_writev_cbk] 0-volbackups-client-23: remote operation failed. [{errno=28}, {error=No space left on device}] [2023-05-06 23:47:38.645376 +0000] W [fuse-bridge.c:1970:fuse_err_cbk] 0-glusterfs-fuse: 980901423: FLUSH() ERR => -1 (No space left on device)

The exact command to reproduce the issue: We use vsftpd and glusterfs for around 8 years for ftp uploads of backup files and around 3 years for nfs uploads of backup files. Each glusterfs node has a single brick and mounts locally a single distributed volume as a glusterfs client locally and receives ftp>vsftpd>glusterfs backup files to the volume each weekend. After about 24 hours of ftp uploads the no space error starts in the logs and then writes start failing. However, we have plenty of space all nodes and we are using 'cluster.min-free-disk: 1GB' volume setting. If we reboot all the glusterfs nodes the problem goes away for a while but, then returns again after ~12-24 hours.

The full output of the command that failed: Here is an example ftp backup file upload that fails this weekend: put: 125ac755-05b1-4d48-9a7d-96e7cd423700-vda.bak: Access failed: 553 Could not create file. (125ac755-05b1-4d48-9a7d-96e7cd423700-vda.qcow2)

Here are some example nfs backup file writes that fail from last weekend: /bin/cp: failed to close '/backups/instance-00016239.xml': No space left on device /bin/cp: failed to close '/backups/instance-00016221.xml': No space left on device /bin/cp: failed to close '/backups/instance-00016248.xml': No space left on device /bin/cp: failed to close '/backups/instance-0001625a.xml': No space left on device qemu-img: error while writing sector 19931136: No space left on device qemu-img: Failed to flush the L2 table cache: No space left on device qemu-img: Failed to flush the refcount block cache: No space left on device qemu-img: /backups/2699ee2f-92b8-4804-a7c7-1dc4e2abed29-vda.qcow2: error while converting qcow2: Could not close the new file: No space left on device /bin/cp: failed to close '/backups/73fa3986-f450-4b36-b7d4-dcbdcd494562-instance-0001609e-disk.config': No space left on device /bin/cp: failed to close '/backups/instance-00016104.xml': No space left on device /bin/cp: failed to close '/backups/5c82fbdb-2be7-45fe-871d-604453868edc-instance-000160f2-disk.config': No space left on device /bin/cp: failed to close '/backups/24acc824-94d5-4026-9abe-072a1b257cc0-instance-00016119-disk.info': No space left on device /bin/cp: failed to close '/backups/instance-0001611f.xml': No space left on device /bin/cp: failed to close '/backups/instance-0001613d.xml': No space left on device

Expected results: It is expected for ftp and nfs upload writes to succeed as they have in the past.

Mandatory info: - The output of the gluster volume info command:

[root@nybaknode1 ~]# gluster volume info volbackups

Volume Name: volbackups Type: Distribute Volume ID: cd40794d-ab74-4706-a0bc-3e95bb8c63a2 Status: Started Snapshot Count: 0 Number of Bricks: 12 Transport-type: tcp Bricks: Brick1: nybaknode9.domain.net:/lvbackups/brick Brick2: nybaknode11.domain.net:/lvbackups/brick Brick3: nybaknode2.domain.net:/lvbackups/brick Brick4: nybaknode3.domain.net:/lvbackups/brick Brick5: nybaknode4.domain.net:/lvbackups/brick Brick6: nybaknode12.domain.net:/lvbackups/brick Brick7: nybaknode5.domain.net:/lvbackups/brick Brick8: nybaknode6.domain.net:/lvbackups/brick Brick9: nybaknode7.domain.net:/lvbackups/brick Brick10: nybaknode8.domain.net:/lvbackups/brick Brick11: nybaknode10.domain.net:/lvbackups/brick Brick12: nybaknode1.domain.net:/lvbackups/brick Options Reconfigured: performance.cache-size: 256MB server.event-threads: 16 performance.io-thread-count: 32 performance.client-io-threads: on client.event-threads: 16 diagnostics.brick-sys-log-level: WARNING diagnostics.brick-log-level: WARNING performance.cache-max-file-size: 2MB transport.address-family: inet nfs.disable: on cluster.min-free-disk: 1GB [root@nybaknode1 ~]#

- The output of the gluster volume status command:

[root@nybaknode1 ~]# gluster volume status volbackups Status of volume: volbackups Gluster process TCP Port RDMA Port Online Pid

Brick nybaknode9.domain.net:/lvbackups/b rick 59026 0 Y 1986 Brick nybaknode11.domain.net:/lvbackups/ brick 60172 0 Y 2033 Brick nybaknode2.domain.net:/lvbackups/b rick 58067 0 Y 1579 Brick nybaknode3.domain.net:/lvbackups/b rick 58210 0 Y 1603 Brick nybaknode4.domain.net:/lvbackups/b rick 52719 0 Y 1681 Brick nybaknode12.domain.net:/lvbackups/ brick 52193 0 Y 1895 Brick nybaknode5.domain.net:/lvbackups/b rick 53655 0 Y 1667 Brick nybaknode6.domain.net:/lvbackups/b rick 56614 0 Y 1591 Brick nybaknode7.domain.net:/lvbackups/b rick 49492 0 Y 1719 Brick nybaknode8.domain.net:/lvbackups/b rick 51497 0 Y 1701 Brick nybaknode10.domain.net:/lvbackups/ brick 49787 0 Y 1878 Brick nybaknode1.domain.net:/lvbackups/b rick 52392 0 Y 1781

Task Status of Volume volbackups

Task : Rebalance ID : 1ea52278-ea1b-4d7e-857a-fe2ee1dc5420 Status : completed

[root@nybaknode1 ~]#

- The output of the gluster volume heal command:

Not relevant. We are using a plain distributed no replica

- The output of the gluster volume status detail command:

[root@nybaknode1 ~]# gluster volume status volbackups detail Status of volume: volbackups

Brick : Brick nybaknode9.domain.net:/lvbackups/brick TCP Port : 59026 RDMA Port : 0 Online : Y Pid : 1986 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 4.6TB Total Disk Space : 29.0TB Inode Count : 3108974976 Free Inodes : 3108903409

Brick : Brick nybaknode11.domain.net:/lvbackups/brick TCP Port : 60172 RDMA Port : 0 Online : Y Pid : 2033 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 8.2TB Total Disk Space : 43.5TB Inode Count : 4672138432 Free Inodes : 4672063970

Brick : Brick nybaknode2.domain.net:/lvbackups/brick TCP Port : 58067 RDMA Port : 0 Online : Y Pid : 1579 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 5.4TB Total Disk Space : 29.0TB Inode Count : 3108921344 Free Inodes : 3108849261

Brick : Brick nybaknode3.domain.net:/lvbackups/brick TCP Port : 58210 RDMA Port : 0 Online : Y Pid : 1603 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 4.6TB Total Disk Space : 29.0TB Inode Count : 3108921344 Free Inodes : 3108849248

Brick : Brick nybaknode4.domain.net:/lvbackups/brick TCP Port : 52719 RDMA Port : 0 Online : Y Pid : 1681 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 5.0TB Total Disk Space : 29.0TB Inode Count : 3108921344 Free Inodes : 3108848785

Brick : Brick nybaknode12.domain.net:/lvbackups/brick TCP Port : 52193 RDMA Port : 0 Online : Y Pid : 1895 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 7.5TB Total Disk Space : 43.5TB Inode Count : 4671718976 Free Inodes : 4671644748

Brick : Brick nybaknode5.domain.net:/lvbackups/brick TCP Port : 53655 RDMA Port : 0 Online : Y Pid : 1667 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 3.3TB Total Disk Space : 29.0TB Inode Count : 3108921344 Free Inodes : 3108849458

Brick : Brick nybaknode6.domain.net:/lvbackups/brick TCP Port : 56614 RDMA Port : 0 Online : Y Pid : 1591 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota Inode Size : 512 Disk Space Free : 5.4TB Total Disk Space : 29.0TB Inode Count : 3108921344 Free Inodes : 3108849533

Brick : Brick nybaknode7.domain.net:/lvbackups/brick TCP Port : 49492 RDMA Port : 0 Online : Y Pid : 1719 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 2.4TB Total Disk Space : 14.4TB Inode Count : 1546333376 Free Inodes : 1546264508

Brick : Brick nybaknode8.domain.net:/lvbackups/brick TCP Port : 51497 RDMA Port : 0 Online : Y Pid : 1701 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=64k,sunit=128,swidth=128,noquota Inode Size : 512 Disk Space Free : 4.4TB Total Disk Space : 29.0TB Inode Count : 3108921344 Free Inodes : 3108849200

Brick : Brick nybaknode10.domain.net:/lvbackups/brick TCP Port : 49787 RDMA Port : 0 Online : Y Pid : 1878 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 6.7TB Total Disk Space : 29.0TB Inode Count : 3108921344 Free Inodes : 3108850142

Brick : Brick nybaknode1.domain.net:/lvbackups/brick TCP Port : 52392 RDMA Port : 0 Online : Y Pid : 1781 File System : xfs Device : /dev/mapper/vgbackups-lvbackups Mount Options : rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=128,swidth=128,noquota Inode Size : 512 Disk Space Free : 6.6TB Total Disk Space : 29.0TB Inode Count : 3108921344 Free Inodes : 3108850426

[root@nybaknode1 ~]#

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/

Here are sanitized logs attached from one of the affected gluster nodes that experienced the issue today and last week. If you need more logs please notify back and we are willing to share more logs directly with someone. We have 12 glusterfs nodes in this location for our backups.

**- Is there any crash ? Provide the backtrace and coredump

No crash is involved as far as i know

Additional info:

We are seeing 'error=No space left on device' issue on Glusterfs 10.4 on AlmaLinux 8 (4.18.0-425.19.2.el8_7.x86_64) and hoping someone might could help advise as its become critical since we use glusterfs for backups of entire infrastucture for this affected location (NYC). We have another different location similarly configured on 10.3 not yet experiencing this issue but, its about 60% smaller size by number of nodes.

We are using a 12 node glusterfs v10.4 (plain) distributed vsftpd backup cluster for years (not new) and recently 3-4 weeks ago upgraded to v9 > v10.4. I do not know if the upgrade is related to this new issue.

We are seeing a new issue 'error=No space left on device' error below on multiple gluster v10.4 nodes in the logs. At the moment seeing it in the logs for about half (5 out of 12) of the nodes last week and 2 more today before i rebooted. The issue will go away if we reboot all the glusterfs nodes but, backups take a little over 2 days to complete each weekend and the issue returns after about 1 day of backups running and before the backup cycle is complete. It has been happening the last 3 weekends we have run backups to these nodes.

#example log msg from /var/log/glusterfs/home-volbackups.log [2023-05-06 23:47:38.645324 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:670:client4_0_writev_cbk] 0-volbackups-client-23: remote operation failed. [{errno=28}, {error=No space left on device}] [2023-05-06 23:47:38.645376 +0000] W [fuse-bridge.c:1970:fuse_err_cbk] 0-glusterfs-fuse: 980901423: FLUSH() ERR => -1 (No space left on device)

Each glusterfs node has a single brick and mounts locally a single distributed volume as a glusterfs client locally and receives over ftp and over nfs-ganesha our backup files to the volume each weekend. This weekend we tested only ftp uploads and the problem happened the same with or without nfs-ganesha backup file uploads.

We distribute the ftp upload load between the servers through a combination of /etc/hosts entries and AWS weighted dns. We also use nfs-ganesha but, this weekend we ran only FTP backup uploads as a test to rule out nfs-ganesha and just experienced the same issue with ftp uploads only.

We have currently 61 TB available on the volume though and each of the 12 nodes have 2-8 TB free so we are nowhere near out of space on any node?

We have already tried the setting change from 'cluster.min-free-disk: 1%' to 'cluster.min-free-disk: 1GB' and rebooted all the gluster nodes to refresh them and it happened again. That was mentioned in this doc https://access.redhat.com/solutions/276483 as an idea.

Does anyone know what we might check next?

Crossposted to https://lists.gluster.org/pipermail/gluster-users/2023-May/040289.html

- The operating system / glusterfs version:

Almalinux 8 4.18.0-425.19.2.el8_7.x86_64 [root@nybaknode1 ~]# rpm -qa | grep 'gluster|nfs' nfs-ganesha-selinux-3.5-3.el8.noarch glusterfs-client-xlators-10.4-1.el8s.x86_64 nfs-ganesha-utils-3.5-3.el8.x86_64 glusterfs-selinux-2.0.1-1.el8s.noarch libglusterd0-10.4-1.el8s.x86_64 nfs-ganesha-gluster-3.5-3.el8.x86_64 libnfsidmap-2.3.3-57.el8_7.1.x86_64 libglusterfs0-10.4-1.el8s.x86_64 glusterfs-cli-10.4-1.el8s.x86_64 glusterfs-server-10.4-1.el8s.x86_64 nfs-ganesha-3.5-3.el8.x86_64 centos-release-nfs-ganesha30-1.0-2.el8.noarch glusterfs-fuse-10.4-1.el8s.x86_64 sssd-nfs-idmap-2.7.3-4.el8_7.3.x86_64 centos-release-gluster10-1.0-1.el8.noarch glusterfs-10.4-1.el8s.x86_64 nfs-utils-2.3.3-57.el8_7.1.x86_64 [root@nybaknode1 ~]#

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

logs-screenshots-sanitized.zip

brandonshoemakerTH avatar May 07 '23 01:05 brandonshoemakerTH

In release(10.4) recently we did change the code-path to respect the storage.reserve value by the patch(https://github.com/gluster/glusterfs/issues/3636), due to that you are facing an issue. For the time being, i would suggest downgrading the glusterfs on release-10.3 to avoid this issue. I will try to fix the same.

mohit84 avatar May 07 '23 02:05 mohit84

Can you please take statedump of any one brick process that is throwing "No space left on device" error currently? To take a statedump you have to send a SIGUSR1 signal to the brick process "kill -SIGUSR1 <brick_pid>", the command will generate a statedump in /var/run/gluster directory.

mohit84 avatar May 07 '23 02:05 mohit84

Hi @mohit84, Thanks so much for the prompt reply and advise. I had to reboot all the nodes just before i posted this issue here to clear the issue so it will take another 12-24 hours before we see the issue reoccur but, it will so will come back with the requested statedump.

Can you point me to any docs or advise the basic approach to follow for a downgrade to 10.3 on RHEL8/AlmaLinux8? Is it a reliable procedure? Unfortunately, i'm not familiar with what all this would entail and this is a 12 node 362 TB backup volume. 'yum downgrade [glusterfs-server-pkg]' does not offer anything so seems would be something more manual for a process.

brandonshoemakerTH avatar May 07 '23 03:05 brandonshoemakerTH

Re-opening. Sorry it seemed it closed on last reply

brandonshoemakerTH avatar May 07 '23 03:05 brandonshoemakerTH

Hi @mohit84, Thanks so much for the prompt reply and advise. I had to reboot all the nodes just before i posted this issue here to clear the issue so it will take another 12-24 hours before we see the issue reoccur but, it will so will come back with the requested statedump.

Can you point me to any docs or advise the basic approach to follow for a downgrade to 10.3 on RHEL8/AlmaLinux8? Is it a reliable procedure? Unfortunately, i'm not familiar with what all this would entail and this is a 12 node 362 TB backup volume. 'yum downgrade [glusterfs-server-pkg]' does not offer anything so seems would be something more manual for a process.

The downgrade procedure is similar to the upgrade, you need to follow the same process. Yes, it is completely safe.

mohit84 avatar May 07 '23 04:05 mohit84

Hi @mohit84, Thanks so much for the prompt reply and advise. I had to reboot all the nodes just before i posted this issue here to clear the issue so it will take another 12-24 hours before we see the issue reoccur but, it will so will come back with the requested statedump. Can you point me to any docs or advise the basic approach to follow for a downgrade to 10.3 on RHEL8/AlmaLinux8? Is it a reliable procedure? Unfortunately, i'm not familiar with what all this would entail and this is a 12 node 362 TB backup volume. 'yum downgrade [glusterfs-server-pkg]' does not offer anything so seems would be something more manual for a process.

The downgrade procedure is similar to the upgrade, you need to follow the same process. Yes, it is completely safe.

You can try once in test environment if you are hesitant to try in the production environment.

mohit84 avatar May 07 '23 04:05 mohit84

Ok yea i will setup test server to test it. I will look for the 10.3 packages tomorrow as it is midnight here. Thanks for advice.

brandonshoemakerTH avatar May 07 '23 05:05 brandonshoemakerTH

Hi @mohit84

I have the statedump file now as we had the issue happen again in the last hour. I've sanitized the file i think and removed our domain references. Is there anything else in this file that might be sensitive besides the domain/hostname reference? Its 230215 lines so not able to check it and be sure.

Is it possible i can send this to you privately somehow or only through a public reply post here or is the only thing sensitive in the file the hostname and directory path references?

brandonshoemakerTH avatar May 08 '23 05:05 brandonshoemakerTH

Yes you can share it on my mail id [email protected]

mohit84 avatar May 08 '23 05:05 mohit84

Thanks @mohit84 we sent the statedump and other log files. We have downgraded to 10.3 this morning and will re-run our backups to these glusterfs 10.3 servers. Do let us know if we can assist your team with anything else regarding this issue. We will report back in a few days after backups hopefully complete without re-encountering the issue.

brandonshoemakerTH avatar May 08 '23 16:05 brandonshoemakerTH

Since updating to version 10.4, we have been facing the same issue. After a couple of hours, we receive the error message 'No space left on device', and we have to restart all three GlusterFS nodes. After that, it works for the next couple of hours until we encounter the same issue again.

eg-ops avatar May 11 '23 05:05 eg-ops

@mohit84 Our backups sent to these glusterfs nodes did complete after 2 days running without encountering the issue again after downgrading to 10.3. We appreciate your help on this issue.

@eg-ops you should consider the same 10.3 downgrade. It does seem to be an issue in 10.4 and not affecting 10.3 from the testing we just did.

brandonshoemakerTH avatar May 11 '23 15:05 brandonshoemakerTH

Hi there, we are currently experiencing the same issue with 10.4. Unfortunately we can't find the 10.3 package for ubuntu (specifically ubuntu 18 bionic). It would be awesome to get some hints where to get the packages!

FleloShe avatar May 22 '23 09:05 FleloShe

@brandonshoemakerTH @eg-ops @FleloShe do you create hard linked files in the Gluster volume that get deleted (at least one of the hardlinks) regularly ?

xhernandez avatar May 25 '23 07:05 xhernandez

@xhernandez no hardlinks used by us @FleloShe sorry i'm not so familiar with gluster pkgs on ubuntu

last 2 weeks we have no seen the issue re-occur on 10.3

brandonshoemakerTH avatar May 25 '23 20:05 brandonshoemakerTH

@xhernandez in our case only one brick appears to be affected cause only 1 gluster node out of 4 is updated from 10.2 to 10.4. The related volume is only used for persisting data for a dockerized redis-instance. I can't really tell what redis does there, but it appears it creates a dump file every X minutes which should be absolutely doable for gluster.

Edit: Log from /var/log/glusterfs/bricks/glusterfs-myvolumename-vol.log [2023-05-26 08:47:10.244980 +0000] E [MSGID: 115067] [server-rpc-fops_v2.c:1324:server4_writev_cbk] 0-myvolumename-vol-server: WRITE info [{frame=168085833}, {WRITEV_fd_no=0}, {uuid_utoa=00afcfe7-5701-418e-b8f8-ff1984032a68}, {client=CTX_ID:c70f43ca-2c20-41fa-b7e2-9786339b84fa-GRAPH_ID:0-PID:3542-HOST:myhostname-PC_NAME:myvolumename-vol-client-0-RECON_NO:-6}, {error-xlator=myvolumename-vol-posix}, {errno=28}, {error=No space left on device}]

FleloShe avatar May 26 '23 06:05 FleloShe

Can i safely downgrade from 11.0 to 10.03, too?

I spotted that i stop the volume and start it back, it start working again. Another thing is that i can increase 'time before locking again' by increasing amount of file descriptors.

nikow avatar Jun 04 '23 08:06 nikow

Can i safely downgrade from 11.0 to 10.03, too?

I spotted that i stop the volume and start it back, it start working again. Another thing is that i can increase 'time before locking again' by increasing amount of file descriptors.

Yes you can downgrade safely. Would it be possible for you to share the reproducer steps we are not facing any issue in our daily regression test build server?

mohit84 avatar Jun 05 '23 04:06 mohit84

Following this issue as we have started to encounter it on 10.4 as well

ben-xo avatar Jul 17 '23 08:07 ben-xo

Are there any sysctl or glusterfs values which could be tuned to help delay this error until a permanent fix is created?

ufou avatar Jul 17 '23 08:07 ufou

I tried downgrading a single node (Ubuntu Jammy running 10.4 package from http://ppa.launchpad.net/gluster/glusterfs-10/ubuntu) after creating some 10.3 Ubuntu jammy packages - unfortunately after installing the 10.3 packages, although the glusterd process starts normally, the gluster brick processes fail to start:

[2023-07-17 11:58:48.847641 +0000] E [MSGID: 106005] [glusterd-utils.c:6917:glusterd_brick_start] 0-management: Unable to start brick server1:/media/storage

and brick logs:

[2023-07-17 11:58:48.773195 +0000] W [MSGID: 101095] [xlator.c:392:xlator_dynload] 0-xlator: DL open failed [{error=/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/server.so: undefined symbol:
xdr_gfx_readdir_rsp}]
[2023-07-17 11:58:48.773216 +0000] E [MSGID: 101002] [graph.y:211:volume_type] 0-parser: Volume 'storage-server', line 133: type 'protocol/server' is not valid or not found on this machine
[2023-07-17 11:58:48.773242 +0000] E [MSGID: 101019] [graph.y:321:volume_end] 0-parser: "type" not specified for volume storage-server
[2023-07-17 11:58:48.773539 +0000] E [MSGID: 100026] [glusterfsd.c:2509:glusterfs_process_volfp] 0-: failed to construct the graph []

Should I try 10.2?

ufou avatar Jul 17 '23 12:07 ufou

OK, ignore the last comment, I neglected to install all the supporting libs created by the build.sh script, so this now works to downgrade to 10.3:

dpkg -i libgfrpc0_10.3-ubuntu1~jammy1_amd64.deb libgfapi0_10.3-ubuntu1~jammy1_amd64.deb libgfchangelog0_10.3-ubuntu1~jammy1_amd64.deb glusterfs-client_10.3-ubuntu1~jammy1_amd64.deb glusterfs-common_10.3-ubuntu1~jammy1_amd64.deb glusterfs-server_10.3-ubuntu1~jammy1_amd64.deb libgfxdr0_10.3-ubuntu1~jammy1_amd64.deb  libglusterd0_10.3-ubuntu1~jammy1_amd64.deb libglusterfs0_10.3-ubuntu1~jammy1_amd64.deb libglusterfs-dev_10.3-ubuntu1~jammy1_amd64.deb

ufou avatar Jul 17 '23 13:07 ufou

I encountered the same "error=No space left on device" issue, even though I had free space. However, in my case, the partitions where the bricks are located have run out of i-nodes. I'm posting this here in case someone else experiences the same problem.

sulphur avatar Aug 08 '23 15:08 sulphur

Setting storage.reserve (I used 5GB) on each volume fixed this for me with 10.4.

NHellFire avatar Aug 19 '23 14:08 NHellFire

I have the same issue on a single brick distributed volume with 10.4. Stopping and starting the volume resolves it temporary. The storage.reserve 1GB didn't helped in our case.

baskinsy avatar Aug 22 '23 22:08 baskinsy

We are constantly hitting this issue on a single brick distributed volume (the most simple type of volume and installation), no other nodes, only one node with one brick, no special settings, typical installation according to the documentation. It works for sometime after stop-start and then again the same. This is getting very frustrating and makes glusterfs unusable. Please provide packages to downgrade to 10.3.

baskinsy avatar Oct 01 '23 15:10 baskinsy

Having hit this same issue, I've attempted the storage.reserve fix with no luck. I also attempted to downgrade to version 10.3 (using Debian's packages) and 10.1 (Using the built in Ubuntu packages), but in both cases the volume wouldn't start because of an "undefined symbol" error. In one case it was "mem_pools" and in the other it was "mem_pools_init".

dubsalicious avatar Oct 05 '23 13:10 dubsalicious

Setting storage.reserve (I used 5GB) on each volume fixed this for me with 10.4.

Update: That only fixed it temporarily. I'm now back to almost every write returning no space left, despite the least amount of free space in the cluster being 200GB. Setting storage.reserve is no longer making a difference. I've now upgraded all nodes to 11 and it's working again.

NHellFire avatar Oct 09 '23 12:10 NHellFire

This is same issue on v 11, please any tutorial on how to downgrade to v10.3 on Ubuntu 22.04 ?

AmineYagoub avatar Oct 09 '23 13:10 AmineYagoub

For those interested, I published fixed packages on my PPA for 22.04 and 20.04. It's based on official 10.4 packages plus the patch fixing the issue (8830f22b2428dbec7bf610341d91d748057236f1). The upgrade should be automatic if you are using packages from the official PPA. https://launchpad.net/~yoann-laissus/+archive/ubuntu/gluster

Arakmar avatar Oct 10 '23 10:10 Arakmar