glusterd2
glusterd2 copied to clipboard
Gluster volume status output not consistent on gd2 pods, after delete/reboot of gd2 pod on gcs setup
Observed behavior
After delete/reboot of any one gd2 pod, login to any other gd2 pod and check the volume status. volume status output is keep on changing.
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
Error getting volume status
Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.51.175:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| d5c60208-b5db-4752-a01b-0f2d5922d478 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick1/brick | false | 0 | 0 |
| 8c4b581b-46a1-4460-b43d-ba7181689d10 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick2/brick | true | 45063 | 57 |
| dbe6f89e-d584-4afd-9da4-9e324384d548 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick3/brick | false | 0 | 0 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
Volume : pvc-d3006e55dce511e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 49b4d49e-1c4d-4f26-9977-b1c181b89f55 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick2/brick | true | 43326 | 511 |
| 1ce6115b-f4c3-4d49-94f0-edb1edc13d58 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick3/brick | true | 44607 | 65 |
| 3d1c57ee-8072-44a6-912e-8df32bc79ac2 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick | false | 0 | 0 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| dbe6f89e-d584-4afd-9da4-9e324384d548 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick3/brick | false | 0 | 0 |
| d5c60208-b5db-4752-a01b-0f2d5922d478 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick1/brick | false | 0 | 0 |
| 8c4b581b-46a1-4460-b43d-ba7181689d10 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick2/brick | true | 45063 | 57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
Volume : pvc-d3006e55dce511e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 3d1c57ee-8072-44a6-912e-8df32bc79ac2 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick | false | 0 | 0 |
| 49b4d49e-1c4d-4f26-9977-b1c181b89f55 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick2/brick | true | 43326 | 511 |
| 1ce6115b-f4c3-4d49-94f0-edb1edc13d58 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick3/brick | true | 44607 | 65 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
Error getting volume status
Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.51.175:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
Error getting volume status
Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.51.175:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
Expected/desired behavior
Volume status output should be proper, It has to show the all volume status , all brick status should be appropriate.
Details on how to reproduce (minimal and precise)
- Create PVC
- Delete/reboot any one of the gd2 pod
- Login to other GD2 pod which is not rebooted or deleted
- Then check the volume status
- Volume status output showing like, some times "no volumes found", some time " Error getting volume status etc..", some times output is proper.
Information about the environment:
- Glusterd2 version used (e.g. v4.1.0 or master): v6.0-dev
- Operating system used: CentOS Linux release 7.5.1804 (Core)
- Glusterd2 compiled from sources, as a package (rpm/deb), or container: container
- Using External ETCD: (yes/no, if yes ETCD version):
- If container, which container image: docker.io/gluster/glusterd2-nightly
- Using kubernetes, openshift, or direct install: kubernetes
- If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: inside
Other useful information
- glusterd2 config files from all nodes (default /etc/glusterd2/glusterd2.toml)
- glusterd2 log files from all nodes (default /var/log/glusterd2/glusterd2.log)
- ETCD configuration
- Contents of
uuid.toml
from all nodes (default /var/lib/glusterd2/uuid.toml) - Output of
statedump
from any one of the node
Useful commands
- To get glusterd2 version
glusterd2 --version
- To get ETCD version
etcd --version
- To get output of statedump
curl http://glusterd2-IP:glusterd2-Port/statedump
@rmadaka Can you check GD2 logs once and paste it here ?
Sorry for late , old setup went to some bad state, reproduced above scenario again and pasted logs below Logs:
time="2018-11-02 12:40:56.356381" level=info msg="10.233.64.1 - - [02/Nov/2018:12:40:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=a84699cf-6b0a-4ef3-a841-9c818deeff3b time="2018-11-02 12:41:56.355033" level=info msg="10.233.64.1 - - [02/Nov/2018:12:41:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=13c2856a-a710-4263-b682-fe3d526eacc6 time="2018-11-02 12:42:56.354790" level=info msg="10.233.64.1 - - [02/Nov/2018:12:42:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=e0ff5df0-9576-442a-8dc2-2b9e6616b9f7 time="2018-11-02 12:43:56.356649" level=info msg="10.233.64.1 - - [02/Nov/2018:12:43:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=ddd5ac8e-184b-4275-9873-41ef94aebdde time="2018-11-02 12:44:56.354932" level=info msg="10.233.64.1 - - [02/Nov/2018:12:44:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=bbfa8508-44c9-43cc-b4e3-9a0ab04ffa52 time="2018-11-02 12:45:56.357132" level=info msg="10.233.64.1 - - [02/Nov/2018:12:45:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=120d547d-0a0d-487f-8302-c325afa1b2e9 time="2018-11-02 12:46:56.355583" level=info msg="10.233.64.1 - - [02/Nov/2018:12:46:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=496eee66-5fad-417f-ae4b-fe72a443a7d1 time="2018-11-02 12:47:56.354818" level=info msg="10.233.64.1 - - [02/Nov/2018:12:47:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=8b67b7cc-c962-41e9-9c23-5b85bef50915 time="2018-11-02 12:48:22.841664" level=info msg="peer connected to store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:48:events.(*livenessWatcher).Watch]" time="2018-11-02 12:48:53.233603" level=info msg="peer disconnected from store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:51:events.(*livenessWatcher).Watch]" time="2018-11-02 12:48:56.356002" level=info msg="10.233.64.1 - - [02/Nov/2018:12:48:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=0ff96806-ffde-4fd9-a60e-bb679dca7244 time="2018-11-02 12:49:56.356170" level=info msg="10.233.64.1 - - [02/Nov/2018:12:49:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=8e9afeb4-4a86-435d-982b-bd1b1a4a7a72 time="2018-11-02 12:50:09.888664" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:09 +0000] "GET /v1/volumes HTTP/1.1" 200 1387" reqid=55c21afd-5db6-45cb-b0ae-eb28453dbc54 time="2018-11-02 12:50:09.908669" level=error msg="failed RPC call" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused"" remotepeer="d640f3ab-cd90-4670-8d85-5871316475be(gluster-kube1-0)" reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 rpc=TxnSvc.RunStep source="[rpc-client.go:72:transaction.runStepOn]" txnid=67d79ffb-13f9-40c1-b55e-33f52a966bd3 time="2018-11-02 12:50:09.908882" level=error msg="Step failed on node." error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused"" node=d640f3ab-cd90-4670-8d85-5871316475be reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 source="[step.go:119:transaction.runStepFuncOnNodes]" step=bricks-status.Check txnid=67d79ffb-13f9-40c1-b55e-33f52a966bd3 time="2018-11-02 12:50:09.935845" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:09 +0000] "GET /v1/volumes/pvc-30cc1447de8611e8/bricks HTTP/1.1" 200 2084" reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 time="2018-11-02 12:50:23.146792" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:23 +0000] "GET /v1/volumes HTTP/1.1" 200 1387" reqid=35a3c2d3-d305-4787-9187-cc7906a4857e time="2018-11-02 12:50:23.167456" level=error msg="failed RPC call" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused"" remotepeer="d640f3ab-cd90-4670-8d85-5871316475be(gluster-kube1-0)" reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 rpc=TxnSvc.RunStep source="[rpc-client.go:72:transaction.runStepOn]" txnid=facdbb90-46cd-4949-b39c-a19f9eb453bc time="2018-11-02 12:50:23.167626" level=error msg="Step failed on node." error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused"" node=d640f3ab-cd90-4670-8d85-5871316475be reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 source="[step.go:119:transaction.runStepFuncOnNodes]" step=bricks-status.Check txnid=facdbb90-46cd-4949-b39c-a19f9eb453bc time="2018-11-02 12:50:23.195260" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:23 +0000] "GET /v1/volumes/pvc-30cc1447de8611e8/bricks HTTP/1.1" 200 2084" reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 time="2018-11-02 12:50:56.355403" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=c49009f7-acdf-4478-b03e-7153d83aa578 time="2018-11-02 12:51:17.533562" level=info msg="peer connected to store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:48:events.(*livenessWatcher).Watch]" time="2018-11-02 12:51:17.789189" level=info msg="client connected" address="10.233.66.74:1023" server=sunrpc source="[server.go:155:sunrpc.(*SunRPC).acceptLoop]" transport=tcp time="2018-11-02 12:51:17.793524" level=info msg="client disconnected" address="10.233.66.74:1023" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
Providing output one more time:
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true | 43980 | 56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false | 0 | 0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true | 40826 | 57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true | 43980 | 56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false | 0 | 0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true | 40826 | 57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
Error getting volume status
Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.9.177:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube2-0 /]# vi /var/log/glusterd2/glusterd2.log
[root@gluster-kube2-0 /]# vi /var/log/glusterd2/glusterd2.log
[root@gluster-kube2-0 /]#
[root@gluster-kube2-0 /]#
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true | 43980 | 56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false | 0 | 0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true | 40826 | 57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true | 40826 | 57 |
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true | 43980 | 56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false | 0 | 0 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
@vpandey-RH any update on this one?
Could this be related to #1054 ?
@vpandey-RH Have we made any progress on this issue?
@atinmu Not yet. I have not been able to give time to this issue. Will work on it.
@rmadaka is this still valid with latest master?