glusterfs icon indicating copy to clipboard operation
glusterfs copied to clipboard

GlusterFS Volume Reporting Only Single Node Capacity

Open SRP86 opened this issue 3 weeks ago • 3 comments

Hello,

We are currently facing a problem where our GlusterFS volume is reporting an incorrect total size on the gateway node. Instead of reflecting the full cluster usable capacity of ~1.2 PB, the mounted volume is showing only ~41 TB, which corresponds to the usable size of a single data node. Below is the detailed overview of our environment, Gluster version, and problem timeline for your reference.

Cluster Architecture:

We have a 15-node GlusterFS cluster, with each node populated with 36 hard drives.

Individual disk capacity: 4 TB
Usable capacity per disk: ~3.64 TB
Total brick count: 540 bricks
Gluster version: 11.1

Volume Configuration:

The storage is configured as a Disperse Volume with the following layout:

Disperse data: 11
Redundancy: 4
Bricks per disperse set: 15 (11+4)
Transport: TCP
The overall populated usable storage is approximately 1.2 PB.

Problem Timeline:

  1. For over 3 months, the mounted volume on the gateway correctly showed ~1.2 PB usable capacity. Last week, one data node went down.
  2. Following this, the Gluster volume was stopped.
  3. The volume was later mounted using only 14 nodes while the failed node was still down.
  4. From that point onward, the gateway started showing only ~41 TB, which matches the usable capacity of a single node.
  5. The failed data node has now been fully restored and is back in service; however, the issue persists, and the mounted volume continues to show only ~41 TB instead of the expected full cluster capacity.

Solutions Tried:

  1. Stopped the volume, unmounted the storage, and restarted the glusterd service on all data nodes; however, the issue persisted.
  2. Stopped the volume, unmounted the storage, mounted all bricks, and restarted the glusterd service on all nodes; even after this, the volume still reports only ~41 TB.

We request your assistance in investigating why the volume is still limited to single-node capacity even after the failed node was restored. Whether the 14-node mount during the downtime affected the disperse volume metadata? And the correct procedure to safely restore full volume visibility without risking data integrity

Regards SRP

SRP86 avatar Dec 09 '25 12:12 SRP86

A lot more information needed to even think what could be wrong. for example gluster vol info gluster vol heal info and other commands

Based on information you provided I can see that storage capacity you are getting is not of ONE NODE but actually it is one of the sub volume in distributed disperse volume.

In your case it should be 36 * (11 + 4)

11 (data bricks) * 4 TB = 44 TB ~ 41TB which you are seeing. You would be having 36 such sub volume if you are having 36 hard drive on each node. I mean that is how it should be configured properly. Please check what are all information you should provide while raising an issue and provide all that get help.

aspandey avatar Dec 09 '25 17:12 aspandey

@aspandey Yes, I have 36 HDDs in each node. Attached the text files which has heal information and volume information

gluster_vol_heal_info.txt gluster_vol_info.txt

[other commands] - What other information are you looking for? Please let me know.

SRP86 avatar Dec 10 '25 12:12 SRP86

@aspandey Yes, I have 36 HDDs in each node. Attached the text files which has heal information and volume information

gluster_vol_heal_info.txt gluster_vol_info.txt

[other commands] - What other information are you looking for? Please let me know.

https://docs.gluster.org/en/latest/Contributors-Guide/Bug-Reporting-Guidelines/

aspandey avatar Dec 10 '25 17:12 aspandey

@aspandey Attached more information. Please let me know further

info_for_gluster_reporting.txt

SRP86 avatar Dec 13 '25 18:12 SRP86