glusterd2 icon indicating copy to clipboard operation
glusterd2 copied to clipboard

When one of the node is down volume stop is failing

Open Akarsha-rai opened this issue 5 years ago • 15 comments

Observed behavior

When one of the node is down volume stop is failing

Expected/desired behavior

When one of the node is down volume stop is success according to gd1 cases

Details on how to reproduce (minimal and precise)

  1. Create and start volume.
  2. Stop glusterd2 on one node
  3. Try stopping the volume and it is failing
glustercli volume stop testvol
Volume stop failed

Response headers:
X-Request-Id: 25121e30-340a-4920-9043-bb40ce8238fb
X-Gluster-Cluster-Id: a6d654f2-0739-4bcc-a545-d9da40931398
X-Gluster-Peer-Id: eb391d57-668c-43ff-a2d1-28727466752f

Response body:
node e4130ab5-1330-4349-879b-111ff6128d6f is probably down

Information about the environment:

  • glusterd2 --version
glusterd version: v6.0-dev.69.git5f88917
git SHA: 5f88917
go version: go1.11.2
go OS/arch: linux/amd64

Operating system used:
[root@dhcp35-229 ~]# cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core)

Glusterd2 compiled from sources, as a package (rpm/deb), or container:
package

Using External ETCD: (yes/no, if yes ETCD version):
yes, etcdmain: etcd Version: 3.3.8

If container, which container image:

Using kubernetes, openshift, or direct install:
direct install

If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside

Other useful information

  • glusterd2 config files from all nodes (default /etc/glusterd2/glusterd2.toml)
 cat /etc/glusterd2/glusterd2.toml 
localstatedir = "/var/lib/glusterd2"
logdir = "/var/log/glusterd2"
logfile = "glusterd2.log"
loglevel = "INFO"
rundir = "/var/run/glusterd2"
defaultpeerport = "24008"
peeraddress = ":24008"
clientaddress = ":24007"
#restauth should be set to false to disable REST authentication in glusterd2
restauth = false
etcdendpoints = "http://10.70.35.10:2379"
noembed = true

Akarsha-rai avatar Dec 12 '18 13:12 Akarsha-rai

@Akarsha-rai can you provide the volume info output please? Did we have any bricks of the volume hosted on the node which was down?

atinmu avatar Dec 12 '18 13:12 atinmu

@atinmu, yes we had. Volume info and status of testvol:

glustercli volume info

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 464b4ef2-e933-4c1f-b839-1143f13c513d
State: Stopped
Transport-type: tcp
Options:
    cluster/replicate.self-heal-daemon: on
    performance/io-cache: off
    performance/md-cache: off
    performance/open-behind: off
    performance/quick-read: off
    performance/read-ahead: off
    performance/readdir-ahead: off
    performance/write-behind: off
Number of Bricks: 2 x 3 = 6
Brick1: 10.70.35.121:/bricks/brick1/testvol
Brick2: 10.70.35.122:/bricks/brick1/testvol
Brick3: 10.70.35.4:/bricks/brick1/testvol
Brick4: 10.70.35.121:/bricks/brick2/testvol
Brick5: 10.70.35.122:/bricks/brick2/testvol
Brick6: 10.70.35.4:/bricks/brick2/testvol

glustercli volume status testvol
Volume : testvol
+--------------------------------------+--------------+------------------------+--------+-------+-------+
|               BRICK ID               |     HOST     |          PATH          | ONLINE | PORT  |  PID  |
+--------------------------------------+--------------+------------------------+--------+-------+-------+
| 5abdb0a7-4ec5-4678-9ed6-efa1a083c0ce | 10.70.35.121 | /bricks/brick2/testvol | true   | 33824 | 16567 |
| 2251de23-3309-4715-98e0-d834c493d68b | 10.70.35.122 | /bricks/brick2/testvol | true   | 37857 |  4897 |
| d34ca239-d512-444b-bac0-169f9626ba47 | 10.70.35.4   | /bricks/brick2/testvol | false  |     0 |     0 |
| 309ce692-cfc8-480b-aeac-bc943fc9e356 | 10.70.35.121 | /bricks/brick1/testvol | true   | 35297 | 16546 |
| 926928c7-7f8f-4491-a11e-51998c6d3ebf | 10.70.35.122 | /bricks/brick1/testvol | true   | 45147 |  4876 |
| 8ba065ce-0906-424d-bc72-fb686881d4f5 | 10.70.35.4   | /bricks/brick1/testvol | false  |     0 |     0 |
+--------------------------------------+--------------+------------------------+--------+-------+-------+

Akarsha-rai avatar Dec 12 '18 13:12 Akarsha-rai

@Akarsha-rai I think this is not a bug then because the volume cannot be stopped if the bricks are unreachable.

rishubhjain avatar Dec 12 '18 13:12 rishubhjain

@atinmu There is one test case on gd1 https://github.com/gluster/glusto-tests/blob/master/tests/functional/glusterd/test_volume_delete.py#L102

I tried running this manually on glusterd version: glusterfs-3.8.4-52.el7rhgs.x86_64

gluster volume status vol1
Status of volume: vol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.48:/bricks/brick1/vol1-b1    49152     0          Y       25161
Brick 10.70.43.156:/bricks/brick1/vol1-b1   49159     0          Y       8805 
Brick 10.70.42.48:/bricks/brick1/vol1-b2    49153     0          Y       25180
Brick 10.70.43.156:/bricks/brick1/vol1-b2   49160     0          Y       8824 
Self-heal Daemon on localhost               N/A       N/A        Y       25200
Self-heal Daemon on 10.70.43.156            N/A       N/A        Y       8844 
 
Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks

After stopping glusterd on node 10.70.43.156

gluster volume status vol1
Status of volume: vol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.48:/bricks/brick1/vol1-b1    49152     0          Y       25161
Brick 10.70.42.48:/bricks/brick1/vol1-b2    49153     0          Y       25180
Self-heal Daemon on localhost               N/A       N/A        Y       25200
 
Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks
 
gluster volume stop vol1
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: vol1: success

Akarsha-rai avatar Dec 12 '18 14:12 Akarsha-rai

I'm trying to hunt down for the reason of making such behaviour in GD1.

atinmu avatar Dec 12 '18 14:12 atinmu

If a node is down we can allow to stop the Volume without any side effects. But if glusterd is down and if we are assuming that node is down then there will be a stale brick process in the node where glusterd was down. @Akarsha-rai can you verify this in your glusterd1 setup? After stopping glusterd on 10.70.43.156 and volume stop can you check in that node to see if glusterfsd processes is still running.

aravindavk avatar Dec 13 '18 04:12 aravindavk

@Akarsha-rai I think this is not a bug then because the volume cannot be stopped if the bricks are unreachable.

We need not stop bricks because bricks are already stopped since the node is down.

aravindavk avatar Dec 13 '18 04:12 aravindavk

@aravindavk I was considering the scenario where glusterd2 was down and the node is reported as down to other nodes.

rishubhjain avatar Dec 13 '18 04:12 rishubhjain

we need to consider what happens if the node is having some network connectivity issue with ETCD

Madhu-1 avatar Dec 13 '18 05:12 Madhu-1

Here's the GD1 behavior:

After the volume stop goes through when GD1/node goes down on one of the nodes, after GD1/node comes back online, the brick process is brought down.

atinmu avatar Dec 13 '18 05:12 atinmu

@aravindavk , after volume is stopped and when the node 10.70.43.156 is down:

ps aux | grep gluster
root     25802  0.1  0.1 1022820 15076 ?       Ssl  01:59   0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b1 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b1.pid -S /var/run/gluster/801d560da24eca5bcc35bd1c601c1c6e.socket --brick-name /bricks/brick1/vol1-b1 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b1.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49152 --xlator-option vol1-server.listen-port=49152
root     25821  0.1  0.1 1022820 12504 ?       Ssl  01:59   0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b2 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b2.pid -S /var/run/gluster/23c8b75461d5de6f153cb61273d1c6b4.socket --brick-name /bricks/brick1/vol1-b2 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b2.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49153 --xlator-option vol1-server.listen-port=49153
root     25841  0.1  0.1 678320  8952 ?        Ssl  01:59   0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1faa167fb6e04fa8fbab068fa514f94a.socket --xlator-option *replicate*.node-uuid=162b044e-3185-4c00-a454-adbeb8e84d39
root     25924  0.0  0.0 112664   972 pts/0    S+   02:01   0:00 grep --color=auto gluster

After starting glusterd on node 10.70.43.156

ps aux | grep gluster
root     25938  5.0  0.0 409932  6664 ?        Ssl  02:01   0:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root     26153  0.0  0.0 112660   968 pts/0    S+   02:02   0:00 grep --color=auto gluster

Akarsha-rai avatar Dec 13 '18 07:12 Akarsha-rai

after volume is stopped and when the node 10.70.43.156 is down:

Need ps output when node is up but glusterd is down. Same steps as mentioned in https://github.com/gluster/glusterd2/issues/1393#issuecomment-446601131

aravindavk avatar Dec 13 '18 07:12 aravindavk

Initial ps output on node 10.70.43.156

ps aux | grep gluster
root     25468  0.5  0.0 604872  5496 ?        Ssl  01:53   0:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root     25686  0.0  0.0 112660   972 pts/0    S+   01:53   0:00 grep --color=auto gluster

After volume "vol1" created, ps output on node 10.70.43.156

ps aux | grep gluster
root     25468  0.1  0.1 604876 11160 ?        Ssl  01:53   0:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root     25802  0.2  0.1 1022820 12460 ?       Ssl  01:59   0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b1 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b1.pid -S /var/run/gluster/801d560da24eca5bcc35bd1c601c1c6e.socket --brick-name /bricks/brick1/vol1-b1 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b1.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49152 --xlator-option vol1-server.listen-port=49152
root     25821  0.1  0.1 1022820 12464 ?       Ssl  01:59   0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b2 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b2.pid -S /var/run/gluster/23c8b75461d5de6f153cb61273d1c6b4.socket --brick-name /bricks/brick1/vol1-b2 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b2.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49153 --xlator-option vol1-server.listen-port=49153
root     25841  0.4  0.1 678320  8940 ?        Ssl  01:59   0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1faa167fb6e04fa8fbab068fa514f94a.socket --xlator-option *replicate*.node-uuid=162b044e-3185-4c00-a454-adbeb8e84d39
root     25866  0.0  0.0 112660   972 pts/0    R+   02:00   0:00 grep --color=auto gluster

When glusterd is stopped on node

ps aux | grep gluster
root     25802  0.0  0.1 1022820 15036 ?       Ssl  01:59   0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b1 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b1.pid -S /var/run/gluster/801d560da24eca5bcc35bd1c601c1c6e.socket --brick-name /bricks/brick1/vol1-b1 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b1.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49152 --xlator-option vol1-server.listen-port=49152
root     25821  0.0  0.1 1022820 12464 ?       Ssl  01:59   0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b2 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b2.pid -S /var/run/gluster/23c8b75461d5de6f153cb61273d1c6b4.socket --brick-name /bricks/brick1/vol1-b2 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b2.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49153 --xlator-option vol1-server.listen-port=49153
root     25841  0.1  0.1 678320  8940 ?        Ssl  01:59   0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1faa167fb6e04fa8fbab068fa514f94a.socket --xlator-option *replicate*.node-uuid=162b044e-3185-4c00-a454-adbeb8e84d39
root     25914  0.0  0.0 112664   972 pts/0    S+   02:01   0:00 grep --color=auto gluster

When the glusterd is down on node 10.70.43.156 and stopped volume "vol1", ps output is simialr to the above one.

After starting glusterd, ps output on 10.70.43.156

 ps aux | grep gluster
root     25938  5.0  0.0 409932  6664 ?        Ssl  02:01   0:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root     26153  0.0  0.0 112660   968 pts/0    S+   02:02   0:00 grep --color=auto gluster

Akarsha-rai avatar Dec 13 '18 07:12 Akarsha-rai

@atinmu we can achieve the similar behavior. Do you see any problem if stale brick process exists till glusterd2 again comes back up?

aravindavk avatar Dec 13 '18 07:12 aravindavk

@aravindavk I still think functionality wise we should disallow volume stop if a node/glusterd service hosting any of the bricks is down. For eg - If have a distribute only volume and in case I stop a volume when glusterd service is down even though that node is up and so as the glusterfsd process is, any existing mount should be able to read/write from/to the brick which by principle is wrong as the volume is in stopped state. What do you think?

But I'm trying to figure out was there any specific reason which triggered us to go with a different logic in GD1.

Do you see any problem if stale brick process exists till glusterd2 again comes back up?

That's how currently GD1 also does, but that's a problem and looks to be a bug in GD1.

@Akarsha-rai if you can manage to figure out the RHBZ which triggered this test case to be added, it'd be awesome :-)

atinmu avatar Dec 13 '18 08:12 atinmu