glusterd2
glusterd2 copied to clipboard
When one of the node is down volume stop is failing
Observed behavior
When one of the node is down volume stop is failing
Expected/desired behavior
When one of the node is down volume stop is success according to gd1 cases
Details on how to reproduce (minimal and precise)
- Create and start volume.
- Stop glusterd2 on one node
- Try stopping the volume and it is failing
glustercli volume stop testvol
Volume stop failed
Response headers:
X-Request-Id: 25121e30-340a-4920-9043-bb40ce8238fb
X-Gluster-Cluster-Id: a6d654f2-0739-4bcc-a545-d9da40931398
X-Gluster-Peer-Id: eb391d57-668c-43ff-a2d1-28727466752f
Response body:
node e4130ab5-1330-4349-879b-111ff6128d6f is probably down
Information about the environment:
- glusterd2 --version
glusterd version: v6.0-dev.69.git5f88917
git SHA: 5f88917
go version: go1.11.2
go OS/arch: linux/amd64
Operating system used:
[root@dhcp35-229 ~]# cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core)
Glusterd2 compiled from sources, as a package (rpm/deb), or container:
package
Using External ETCD: (yes/no, if yes ETCD version):
yes, etcdmain: etcd Version: 3.3.8
If container, which container image:
Using kubernetes, openshift, or direct install:
direct install
If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside
Other useful information
- glusterd2 config files from all nodes (default /etc/glusterd2/glusterd2.toml)
cat /etc/glusterd2/glusterd2.toml
localstatedir = "/var/lib/glusterd2"
logdir = "/var/log/glusterd2"
logfile = "glusterd2.log"
loglevel = "INFO"
rundir = "/var/run/glusterd2"
defaultpeerport = "24008"
peeraddress = ":24008"
clientaddress = ":24007"
#restauth should be set to false to disable REST authentication in glusterd2
restauth = false
etcdendpoints = "http://10.70.35.10:2379"
noembed = true
@Akarsha-rai can you provide the volume info output please? Did we have any bricks of the volume hosted on the node which was down?
@atinmu, yes we had. Volume info and status of testvol:
glustercli volume info
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 464b4ef2-e933-4c1f-b839-1143f13c513d
State: Stopped
Transport-type: tcp
Options:
cluster/replicate.self-heal-daemon: on
performance/io-cache: off
performance/md-cache: off
performance/open-behind: off
performance/quick-read: off
performance/read-ahead: off
performance/readdir-ahead: off
performance/write-behind: off
Number of Bricks: 2 x 3 = 6
Brick1: 10.70.35.121:/bricks/brick1/testvol
Brick2: 10.70.35.122:/bricks/brick1/testvol
Brick3: 10.70.35.4:/bricks/brick1/testvol
Brick4: 10.70.35.121:/bricks/brick2/testvol
Brick5: 10.70.35.122:/bricks/brick2/testvol
Brick6: 10.70.35.4:/bricks/brick2/testvol
glustercli volume status testvol
Volume : testvol
+--------------------------------------+--------------+------------------------+--------+-------+-------+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+--------------+------------------------+--------+-------+-------+
| 5abdb0a7-4ec5-4678-9ed6-efa1a083c0ce | 10.70.35.121 | /bricks/brick2/testvol | true | 33824 | 16567 |
| 2251de23-3309-4715-98e0-d834c493d68b | 10.70.35.122 | /bricks/brick2/testvol | true | 37857 | 4897 |
| d34ca239-d512-444b-bac0-169f9626ba47 | 10.70.35.4 | /bricks/brick2/testvol | false | 0 | 0 |
| 309ce692-cfc8-480b-aeac-bc943fc9e356 | 10.70.35.121 | /bricks/brick1/testvol | true | 35297 | 16546 |
| 926928c7-7f8f-4491-a11e-51998c6d3ebf | 10.70.35.122 | /bricks/brick1/testvol | true | 45147 | 4876 |
| 8ba065ce-0906-424d-bc72-fb686881d4f5 | 10.70.35.4 | /bricks/brick1/testvol | false | 0 | 0 |
+--------------------------------------+--------------+------------------------+--------+-------+-------+
@Akarsha-rai I think this is not a bug then because the volume cannot be stopped if the bricks are unreachable.
@atinmu There is one test case on gd1 https://github.com/gluster/glusto-tests/blob/master/tests/functional/glusterd/test_volume_delete.py#L102
I tried running this manually on glusterd version: glusterfs-3.8.4-52.el7rhgs.x86_64
gluster volume status vol1
Status of volume: vol1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.42.48:/bricks/brick1/vol1-b1 49152 0 Y 25161
Brick 10.70.43.156:/bricks/brick1/vol1-b1 49159 0 Y 8805
Brick 10.70.42.48:/bricks/brick1/vol1-b2 49153 0 Y 25180
Brick 10.70.43.156:/bricks/brick1/vol1-b2 49160 0 Y 8824
Self-heal Daemon on localhost N/A N/A Y 25200
Self-heal Daemon on 10.70.43.156 N/A N/A Y 8844
Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks
After stopping glusterd on node 10.70.43.156
gluster volume status vol1
Status of volume: vol1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.42.48:/bricks/brick1/vol1-b1 49152 0 Y 25161
Brick 10.70.42.48:/bricks/brick1/vol1-b2 49153 0 Y 25180
Self-heal Daemon on localhost N/A N/A Y 25200
Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks
gluster volume stop vol1
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: vol1: success
I'm trying to hunt down for the reason of making such behaviour in GD1.
If a node is down we can allow to stop the Volume without any side effects. But if glusterd is down and if we are assuming that node is down then there will be a stale brick process in the node where glusterd was down. @Akarsha-rai can you verify this in your glusterd1 setup? After stopping glusterd on 10.70.43.156 and volume stop can you check in that node to see if glusterfsd processes is still running.
@Akarsha-rai I think this is not a bug then because the volume cannot be stopped if the bricks are unreachable.
We need not stop bricks because bricks are already stopped since the node is down.
@aravindavk I was considering the scenario where glusterd2 was down and the node is reported as down to other nodes.
we need to consider what happens if the node is having some network connectivity issue with ETCD
Here's the GD1 behavior:
After the volume stop goes through when GD1/node goes down on one of the nodes, after GD1/node comes back online, the brick process is brought down.
@aravindavk , after volume is stopped and when the node 10.70.43.156 is down:
ps aux | grep gluster
root 25802 0.1 0.1 1022820 15076 ? Ssl 01:59 0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b1 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b1.pid -S /var/run/gluster/801d560da24eca5bcc35bd1c601c1c6e.socket --brick-name /bricks/brick1/vol1-b1 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b1.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49152 --xlator-option vol1-server.listen-port=49152
root 25821 0.1 0.1 1022820 12504 ? Ssl 01:59 0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b2 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b2.pid -S /var/run/gluster/23c8b75461d5de6f153cb61273d1c6b4.socket --brick-name /bricks/brick1/vol1-b2 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b2.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49153 --xlator-option vol1-server.listen-port=49153
root 25841 0.1 0.1 678320 8952 ? Ssl 01:59 0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1faa167fb6e04fa8fbab068fa514f94a.socket --xlator-option *replicate*.node-uuid=162b044e-3185-4c00-a454-adbeb8e84d39
root 25924 0.0 0.0 112664 972 pts/0 S+ 02:01 0:00 grep --color=auto gluster
After starting glusterd on node 10.70.43.156
ps aux | grep gluster
root 25938 5.0 0.0 409932 6664 ? Ssl 02:01 0:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root 26153 0.0 0.0 112660 968 pts/0 S+ 02:02 0:00 grep --color=auto gluster
after volume is stopped and when the node 10.70.43.156 is down:
Need ps output when node is up but glusterd is down. Same steps as mentioned in https://github.com/gluster/glusterd2/issues/1393#issuecomment-446601131
Initial ps output on node 10.70.43.156
ps aux | grep gluster
root 25468 0.5 0.0 604872 5496 ? Ssl 01:53 0:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root 25686 0.0 0.0 112660 972 pts/0 S+ 01:53 0:00 grep --color=auto gluster
After volume "vol1" created, ps output on node 10.70.43.156
ps aux | grep gluster
root 25468 0.1 0.1 604876 11160 ? Ssl 01:53 0:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root 25802 0.2 0.1 1022820 12460 ? Ssl 01:59 0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b1 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b1.pid -S /var/run/gluster/801d560da24eca5bcc35bd1c601c1c6e.socket --brick-name /bricks/brick1/vol1-b1 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b1.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49152 --xlator-option vol1-server.listen-port=49152
root 25821 0.1 0.1 1022820 12464 ? Ssl 01:59 0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b2 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b2.pid -S /var/run/gluster/23c8b75461d5de6f153cb61273d1c6b4.socket --brick-name /bricks/brick1/vol1-b2 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b2.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49153 --xlator-option vol1-server.listen-port=49153
root 25841 0.4 0.1 678320 8940 ? Ssl 01:59 0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1faa167fb6e04fa8fbab068fa514f94a.socket --xlator-option *replicate*.node-uuid=162b044e-3185-4c00-a454-adbeb8e84d39
root 25866 0.0 0.0 112660 972 pts/0 R+ 02:00 0:00 grep --color=auto gluster
When glusterd is stopped on node
ps aux | grep gluster
root 25802 0.0 0.1 1022820 15036 ? Ssl 01:59 0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b1 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b1.pid -S /var/run/gluster/801d560da24eca5bcc35bd1c601c1c6e.socket --brick-name /bricks/brick1/vol1-b1 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b1.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49152 --xlator-option vol1-server.listen-port=49152
root 25821 0.0 0.1 1022820 12464 ? Ssl 01:59 0:00 /usr/sbin/glusterfsd -s 10.70.43.156 --volfile-id vol1.10.70.43.156.bricks-brick1-vol1-b2 -p /var/run/gluster/vols/vol1/10.70.43.156-bricks-brick1-vol1-b2.pid -S /var/run/gluster/23c8b75461d5de6f153cb61273d1c6b4.socket --brick-name /bricks/brick1/vol1-b2 -l /var/log/glusterfs/bricks/bricks-brick1-vol1-b2.log --xlator-option *-posix.glusterd-uuid=162b044e-3185-4c00-a454-adbeb8e84d39 --brick-port 49153 --xlator-option vol1-server.listen-port=49153
root 25841 0.1 0.1 678320 8940 ? Ssl 01:59 0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1faa167fb6e04fa8fbab068fa514f94a.socket --xlator-option *replicate*.node-uuid=162b044e-3185-4c00-a454-adbeb8e84d39
root 25914 0.0 0.0 112664 972 pts/0 S+ 02:01 0:00 grep --color=auto gluster
When the glusterd is down on node 10.70.43.156 and stopped volume "vol1", ps output is simialr to the above one.
After starting glusterd, ps output on 10.70.43.156
ps aux | grep gluster
root 25938 5.0 0.0 409932 6664 ? Ssl 02:01 0:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root 26153 0.0 0.0 112660 968 pts/0 S+ 02:02 0:00 grep --color=auto gluster
@atinmu we can achieve the similar behavior. Do you see any problem if stale brick process exists till glusterd2 again comes back up?
@aravindavk I still think functionality wise we should disallow volume stop if a node/glusterd service hosting any of the bricks is down. For eg - If have a distribute only volume and in case I stop a volume when glusterd service is down even though that node is up and so as the glusterfsd process is, any existing mount should be able to read/write from/to the brick which by principle is wrong as the volume is in stopped state. What do you think?
But I'm trying to figure out was there any specific reason which triggered us to go with a different logic in GD1.
Do you see any problem if stale brick process exists till glusterd2 again comes back up?
That's how currently GD1 also does, but that's a problem and looks to be a bug in GD1.
@Akarsha-rai if you can manage to figure out the RHBZ which triggered this test case to be added, it'd be awesome :-)