glusterfs
glusterfs copied to clipboard
Potential memory leak on client of GlusterFS v5.6
Description of problem: Potential memory leak on client of GlusterFS v5.6. After a network instability (not fully verified, but it occurred before the memory increase), client memory consumption starts increasing (in steps, not continuously) until it gets all available resources. Restart of client does not solve the issue. Total restart (both servers and clients) solves the issue for a while, but it is reproduced after a few days.
The exact command to reproduce the issue: Setup of 3 nodes in cluster, with the attached volume info.
The full output of the command that failed:
Expected results: N/A
Mandatory info:
- The output of the gluster volume info
command:
Volume Name: vol1
Type: Replicate
Volume ID: 5fc66dfd-3449-4f6e-8eb9-4576d97cb8dd
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node-2.storage-server:/mnt/bricks/vol1/brick
Brick2: node-0.storage-server:/mnt/bricks/vol1/brick
Brick3: node-1.storage-server:/mnt/bricks/vol1/brick
Options Reconfigured:
diagnostics.brick-sys-log-level: INFO
cluster.favorite-child-policy: majority
network.ping-timeout: 10
performance.io-cache: off
performance.read-ahead: off
performance.readdir-ahead: off
performance.stat-prefetch: off
performance.open-behind: off
cluster.server-quorum-type: server
performance.write-behind: off
performance.quick-read: off
cluster.quorum-type: auto
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.server-quorum-ratio: 51%
- The output of the gluster volume status
command:
Status of volume: vol1
Gluster process TCP Port RDMA Port Online Pid
Brick node-2.storage-server:/mnt/bricks/vol1/brick N/A N/A N N/A
Brick node-1.storage-server:/mnt/bricks/vol1/brick 49152 0 Y 113
Brick node-0.storage-server:/mnt/bricks/vol1/brick 49154 0 Y 99
Self-heal Daemon on localhost N/A N/A Y 23168
Self-heal Daemon on node-1 N/A N/A Y 31375
Self-heal Daemon on node-2 N/A N/A Y 223
Task Status of Volume vol1
- The output of the gluster volume heal
command:
Nothing reported in heal commands (info/split-brain) output.
**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/ On client side, many logs like the following: [2021-08-19 10:33:50.857022] E [MSGID: 114058] [client-handshake.c:1449:client_query_portmap_cbk] 0-vol1-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
On server side many logs like the following one reported on all three peer nodes: [2022-01-22T10:09:12Z","process":"glustershd] [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-vol1-client-2: changing port to 49154 (from 0)"}}
**- Is there any crash ? Provide the backtrace and coredump
Additional info:
- The operating system / glusterfs version: GlusterFS v5.6 Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration
Statedumps of two clients and server:
mnt-bricks-vol1-brick.120.dump.1643016382.txt client-0_glusterdump.20167.dump.1643018362.txt client-1_glusterdump.21551.dump.1643018561.txt
Hey @stais , You are using very old glusterfs version. The current maintained versions are glusterfs 9 and 10. It is high possibility that these leaks are already fixed in new versions.
Also from gluster v status we can see that 1st brick is not online. And the same is being reported over logs. Try to get these brick online and see if issue persists.
Brick node-2.storage-server:/mnt/bricks/vol1/brick N/A N/A N N/A <----------------------- offline bricks Brick node-1.storage-server:/mnt/bricks/vol1/brick 49152 0 Y 113 Brick node-0.storage-server:/mnt/bricks/vol1/brick 49154 0 Y 99
Hello @Sheetalpamecha, Yes, one brick went offline (the network instability I referred), but when everything came back to normal (all three bricks online), that was the starting point of increasing memory consumption on client side.
@stais Did you try it on the current releases and Is the bug reproducible there?
Hello,
We have no reproduction steps. So, trying with a new version does not ensure that we will get a valid result.
I am attaching the statedump of the client before restart (when issue with memory increase was present) and just after the restart. glusterdump.1478.dump.1644349084_postRestart.txt glusterdump.21551.dump.1644347302_preRestart.txt
Since we have seen similar logs to the following issue, could this issue cause similar memory consumption?
https://review.gluster.org/#/c/glusterfs/+/23181/
Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.