glusterfs icon indicating copy to clipboard operation
glusterfs copied to clipboard

Potential memory leak on client of GlusterFS v5.6

Open stais opened this issue 3 years ago • 7 comments

Description of problem: Potential memory leak on client of GlusterFS v5.6. After a network instability (not fully verified, but it occurred before the memory increase), client memory consumption starts increasing (in steps, not continuously) until it gets all available resources. Restart of client does not solve the issue. Total restart (both servers and clients) solves the issue for a while, but it is reproduced after a few days.

The exact command to reproduce the issue: Setup of 3 nodes in cluster, with the attached volume info.

The full output of the command that failed:

Expected results: N/A

Mandatory info: - The output of the gluster volume info command: Volume Name: vol1 Type: Replicate Volume ID: 5fc66dfd-3449-4f6e-8eb9-4576d97cb8dd Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node-2.storage-server:/mnt/bricks/vol1/brick Brick2: node-0.storage-server:/mnt/bricks/vol1/brick Brick3: node-1.storage-server:/mnt/bricks/vol1/brick Options Reconfigured: diagnostics.brick-sys-log-level: INFO cluster.favorite-child-policy: majority network.ping-timeout: 10 performance.io-cache: off performance.read-ahead: off performance.readdir-ahead: off performance.stat-prefetch: off performance.open-behind: off cluster.server-quorum-type: server performance.write-behind: off performance.quick-read: off cluster.quorum-type: auto transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.server-quorum-ratio: 51%

- The output of the gluster volume status command:

Status of volume: vol1
Gluster process TCP Port RDMA Port Online Pid

Brick node-2.storage-server:/mnt/bricks/vol1/brick N/A N/A N N/A
Brick node-1.storage-server:/mnt/bricks/vol1/brick 49152 0 Y 113
Brick node-0.storage-server:/mnt/bricks/vol1/brick 49154 0 Y 99
Self-heal Daemon on localhost N/A N/A Y 23168
Self-heal Daemon on node-1 N/A N/A Y 31375
Self-heal Daemon on node-2 N/A N/A Y 223

Task Status of Volume vol1

- The output of the gluster volume heal command: Nothing reported in heal commands (info/split-brain) output.

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/ On client side, many logs like the following: [2021-08-19 10:33:50.857022] E [MSGID: 114058] [client-handshake.c:1449:client_query_portmap_cbk] 0-vol1-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.

On server side many logs like the following one reported on all three peer nodes: [2022-01-22T10:09:12Z","process":"glustershd] [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-vol1-client-2: changing port to 49154 (from 0)"}}

**- Is there any crash ? Provide the backtrace and coredump

Additional info:

- The operating system / glusterfs version: GlusterFS v5.6 Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

stais avatar Jan 30 '22 11:01 stais

Hey @stais , You are using very old glusterfs version. The current maintained versions are glusterfs 9 and 10. It is high possibility that these leaks are already fixed in new versions.

Also from gluster v status we can see that 1st brick is not online. And the same is being reported over logs. Try to get these brick online and see if issue persists.

Brick node-2.storage-server:/mnt/bricks/vol1/brick N/A N/A N N/A <----------------------- offline bricks Brick node-1.storage-server:/mnt/bricks/vol1/brick 49152 0 Y 113 Brick node-0.storage-server:/mnt/bricks/vol1/brick 49154 0 Y 99

Sheetalpamecha avatar Jan 31 '22 04:01 Sheetalpamecha

Hello @Sheetalpamecha, Yes, one brick went offline (the network instability I referred), but when everything came back to normal (all three bricks online), that was the starting point of increasing memory consumption on client side.

stais avatar Jan 31 '22 09:01 stais

@stais Did you try it on the current releases and Is the bug reproducible there?

Sheetalpamecha avatar Feb 01 '22 04:02 Sheetalpamecha

Hello,

We have no reproduction steps. So, trying with a new version does not ensure that we will get a valid result.

I am attaching the statedump of the client before restart (when issue with memory increase was present) and just after the restart. glusterdump.1478.dump.1644349084_postRestart.txt glusterdump.21551.dump.1644347302_preRestart.txt

stais avatar Feb 14 '22 12:02 stais

Since we have seen similar logs to the following issue, could this issue cause similar memory consumption?

https://review.gluster.org/#/c/glusterfs/+/23181/

stais avatar Feb 14 '22 15:02 stais

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] avatar Sep 21 '22 00:09 stale[bot]

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

stale[bot] avatar Nov 01 '22 21:11 stale[bot]