glusterd2 icon indicating copy to clipboard operation
glusterd2 copied to clipboard

glustershd memory keeps increasing while creating PVCs

Open PrasadDesala opened this issue 6 years ago • 11 comments

glusterfs memory increased from 74MB to 6.8G while creating 200 PVCs.

Before: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1150 root 20 0 3637200 74560 3320 S 0.0 0.2 0:01.52 glusterfs

After 200 PVCs are created: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1150 root 20 0 101.0g 6.8g 3388 S 94.1 21.6 17:43.07 glusterfs

Below are few other observations:

  1. For few of the volumes brick port is showing as -1 Volume : pvc-9480160e-1279-11e9-a7a2-5254001ae311 +--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+-------+------+ | BRICK ID | HOST | PATH | ONLINE | PORT | PID | +--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+-------+------+ | b7a95b9b-17da-4220-a38d-2d23eb75c83a | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-9480160e-1279-11e9-a7a2-5254001ae311/subvol1/brick1/brick | true | 40635 | 3612 | | 133011b8-1825-4b6e-87e1-d7bed7332f55 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-9480160e-1279-11e9-a7a2-5254001ae311/subvol1/brick2/brick | true | -1 | 3041 | | ebfb7837-8657-46c9-aad9-449b6a1ba6bf | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-9480160e-1279-11e9-a7a2-5254001ae311/subvol1/brick3/brick | true | 45864 | 3146 | +--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+-------+------+
  2. I am seeing below continuous messages in glustershd logs, [2019-01-07 13:14:14.157784] W [MSGID: 101012] [common-utils.c:3186:gf_get_reserved_ports] 36-glusterfs: could not open the file /proc/sys/net/ipv4/ip_local_reserved_ports for getting reserved ports info [No such file or directory] [2019-01-07 13:14:14.157840] W [MSGID: 101081] [common-utils.c:3226:gf_process_reserved_ports] 36-glusterfs: Not able to get reserved ports, hence there is a possibility that glusterfs may consume reserved port [2019-01-07 13:14:14.160159] W [MSGID: 101012] [common-utils.c:3186:gf_get_reserved_ports] 36-glusterfs: could not open the file /proc/sys/net/ipv4/ip_local_reserved_ports for getting reserved ports info [No such file or directory] [2019-01-07 13:14:14.160213] W [MSGID: 101081] [common-utils.c:3226:gf_process_reserved_ports] 36-glusterfs: Not able to get reserved ports, hence there is a possibility that glusterfs may consume reserved port [2019-01-07 13:14:14.183845] I [socket.c:811:__socket_shutdown] 36-pvc-93515db8-1279-11e9-a7a2-5254001ae311-replicate-0-client-1: intentional socket shutdown(7073) [2019-01-07 13:14:14.183946] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 36-epoll: Failed to dispatch handler
  3. Below logs are continuously logged in glusterd2 logs, time="2019-01-07 13:15:28.484617" level=info msg="client connected" address="10.233.64.8:47178" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp time="2019-01-07 13:15:28.485340" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/var/run/glusterd2/bricks/pvc-9480160e-1279-11e9-a7a2-5254001ae311/subvol1/brick2/brick error="SearchByBrickPath: port for brick /var/run/glusterd2/bricks/pvc-9480160e-1279-11e9-a7a2-5254001ae311/subvol1/brick2/brick not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"

Observed behavior

glusterfs memory increased from 74MB to 6.8G after 200 PVCs are created. Also seeing above continuous messages getting logged.

Expected/desired behavior

glusterfs should not consume that much memory.

Details on how to reproduce (minimal and precise)

  1. Create a 3 node GCS setup using valgrind.
  2. Create 200 PVCs and keep on monitoring glusterfs resource consumption.

Information about the environment:

  • Glusterd2 version used (e.g. v4.1.0 or master): v6.0-dev.99.git0839909
  • Operating system used: CentOS 7.6
  • Glusterd2 compiled from sources, as a package (rpm/deb), or container:
  • Using External ETCD: (yes/no, if yes ETCD version): Yes, 3.3.8
  • If container, which container image:
  • Using kubernetes, openshift, or direct install:
  • If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: kubernetes

PrasadDesala avatar Jan 07 '19 13:01 PrasadDesala

Attaching glusterd2 dump, glusterd2 logs and glusterfs process state dump.

kube3-glusterd2.log.gz kube2-glusterd2.log.gz kube1-glusterd2.log.gz glusterdump.1150.dump.1546865584.gz statedump_kube-1.txt

PrasadDesala avatar Jan 07 '19 13:01 PrasadDesala

@PrasadDesala I am assuming you meant glustershd is consuming high memory? Also did you enable brick multiplexing in the setup?

atinmu avatar Jan 07 '19 13:01 atinmu

@PrasadDesala I am assuming you meant glustershd is consuming high memory? Also did you enable brick multiplexing in the setup?

I think it is glustershd but I am not sure why glustershd is consuming memory as I am just creating PVCs so no healing should take place. I see the process name as glusterfs.

Brick-mux is not enabled on the setup.

PrasadDesala avatar Jan 08 '19 05:01 PrasadDesala

I think it is glustershd but I am not sure why glustershd is consuming memory as I am just creating PVCs so no healing should take place. I see the process name as glusterfs.

Yes, this is self heal process. Can be confirmed by checking cat /proc/<pid>/cmdline

aravindavk avatar Jan 08 '19 06:01 aravindavk

@itisravi @karthik-us ^^ might be worth to check the same with GD1 based deployment. This isn't specific to GD2 problem as such.

atinmu avatar Jan 08 '19 06:01 atinmu

I suspect this is due to https://review.gluster.org/#/c/glusterfs/+/21990/ also. Lets run a round of tests tomorrow as it is merged today.

amarts avatar Jan 08 '19 11:01 amarts

On the latest master with multiple iterations we don't see memory consumption of glustershd process anything near to what has been reported and based on that I'm closing this for now. If we happen to hit this again, please feel free to reopen.

atinmu avatar Jan 16 '19 11:01 atinmu

This issue is still seen on the last nightly build. glustershd process memory increased from 8616 to 6.2g.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
before: 395 root 20 0 514608 8616 3188 S 0.0 0.0 0:00.05 glusterfs before_1: 395 root 20 0 95.3g 6.2g 3324 S 88.2 19.9 14:49.35 glusterfs

root@gluster-kube1-0 ~]# cat /proc/395/cmdline /usr/sbin/glusterfs-sgluster-kube1-0.glusterd2.gcs--volfile-server-port24007--volfile-idgluster/glustershd-p/var/run/glusterd2/glustershd.pid-l/var/log/glusterd2/glusterfs/glustershd.log-S/var/run/glusterd2/shd-492ab606e75778b6.socket--xlator-optionreplicate.node-uuid=9842221d-97d1-4041-9d4c-51f6fc6ef191[root@gluster-kube1-0 ~]# ps -ef | grep -i glustershd glusterd version: v6.0-dev.109.gitdfb2462

PrasadDesala avatar Jan 17 '19 07:01 PrasadDesala

Can we disable shd for now in this setup, and re-enable when things settle down?

amarts avatar Jan 17 '19 08:01 amarts

@PrasadDesala At this moment with every new PVCs we don't restart glustershd (which is a bug in GD2) and hence the overall memory consumption by the process remains static irrespective of how many PVCs we create and this is what is reflecting in my test setup too. So I'd definitely like to take a look at the setup where you are able to reproduce this.

atinmu avatar Jan 17 '19 10:01 atinmu

@atinmu This issue is closed and I don't have the perms to reopen it. If you have the access can you please reopen this issue.

PrasadDesala avatar Jan 21 '19 07:01 PrasadDesala