glusterd2 icon indicating copy to clipboard operation
glusterd2 copied to clipboard

Gluster volume status output not consistent on gd2 pods, after delete/reboot of gd2 pod on gcs setup

Open rmadaka opened this issue 6 years ago • 8 comments

Observed behavior

After delete/reboot of any one gd2 pod, login to any other gd2 pod and check the volume status. volume status output is keep on changing.

No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
Error getting volume status

Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.51.175:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| d5c60208-b5db-4752-a01b-0f2d5922d478 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick1/brick | false  |     0 |   0 |
| 8c4b581b-46a1-4460-b43d-ba7181689d10 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick2/brick | true   | 45063 |  57 |
| dbe6f89e-d584-4afd-9da4-9e324384d548 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick3/brick | false  |     0 |   0 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
Volume : pvc-d3006e55dce511e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 49b4d49e-1c4d-4f26-9977-b1c181b89f55 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick2/brick | true   | 43326 | 511 |
| 1ce6115b-f4c3-4d49-94f0-edb1edc13d58 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick3/brick | true   | 44607 |  65 |
| 3d1c57ee-8072-44a6-912e-8df32bc79ac2 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick | false  |     0 |   0 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| dbe6f89e-d584-4afd-9da4-9e324384d548 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick3/brick | false  |     0 |   0 |
| d5c60208-b5db-4752-a01b-0f2d5922d478 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick1/brick | false  |     0 |   0 |
| 8c4b581b-46a1-4460-b43d-ba7181689d10 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick2/brick | true   | 45063 |  57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
Volume : pvc-d3006e55dce511e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 3d1c57ee-8072-44a6-912e-8df32bc79ac2 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick | false  |     0 |   0 |
| 49b4d49e-1c4d-4f26-9977-b1c181b89f55 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick2/brick | true   | 43326 | 511 |
| 1ce6115b-f4c3-4d49-94f0-edb1edc13d58 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick3/brick | true   | 44607 |  65 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
Error getting volume status

Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.51.175:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
Error getting volume status

Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.51.175:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found

Expected/desired behavior

Volume status output should be proper, It has to show the all volume status , all brick status should be appropriate.

Details on how to reproduce (minimal and precise)

  1. Create PVC
  2. Delete/reboot any one of the gd2 pod
  3. Login to other GD2 pod which is not rebooted or deleted
  4. Then check the volume status
  5. Volume status output showing like, some times "no volumes found", some time " Error getting volume status etc..", some times output is proper.

Information about the environment:

  • Glusterd2 version used (e.g. v4.1.0 or master): v6.0-dev
  • Operating system used: CentOS Linux release 7.5.1804 (Core)
  • Glusterd2 compiled from sources, as a package (rpm/deb), or container: container
  • Using External ETCD: (yes/no, if yes ETCD version):
  • If container, which container image: docker.io/gluster/glusterd2-nightly
  • Using kubernetes, openshift, or direct install: kubernetes
  • If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: inside

Other useful information

  • glusterd2 config files from all nodes (default /etc/glusterd2/glusterd2.toml)
  • glusterd2 log files from all nodes (default /var/log/glusterd2/glusterd2.log)
  • ETCD configuration
  • Contents of uuid.toml from all nodes (default /var/lib/glusterd2/uuid.toml)
  • Output of statedump from any one of the node

Useful commands

  • To get glusterd2 version
    glusterd2 --version
    
  • To get ETCD version
    etcd --version
    
  • To get output of statedump
    curl http://glusterd2-IP:glusterd2-Port/statedump
    

rmadaka avatar Oct 31 '18 10:10 rmadaka

@rmadaka Can you check GD2 logs once and paste it here ?

vpandey-RH avatar Oct 31 '18 11:10 vpandey-RH

Sorry for late , old setup went to some bad state, reproduced above scenario again and pasted logs below Logs:

time="2018-11-02 12:40:56.356381" level=info msg="10.233.64.1 - - [02/Nov/2018:12:40:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=a84699cf-6b0a-4ef3-a841-9c818deeff3b time="2018-11-02 12:41:56.355033" level=info msg="10.233.64.1 - - [02/Nov/2018:12:41:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=13c2856a-a710-4263-b682-fe3d526eacc6 time="2018-11-02 12:42:56.354790" level=info msg="10.233.64.1 - - [02/Nov/2018:12:42:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=e0ff5df0-9576-442a-8dc2-2b9e6616b9f7 time="2018-11-02 12:43:56.356649" level=info msg="10.233.64.1 - - [02/Nov/2018:12:43:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=ddd5ac8e-184b-4275-9873-41ef94aebdde time="2018-11-02 12:44:56.354932" level=info msg="10.233.64.1 - - [02/Nov/2018:12:44:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=bbfa8508-44c9-43cc-b4e3-9a0ab04ffa52 time="2018-11-02 12:45:56.357132" level=info msg="10.233.64.1 - - [02/Nov/2018:12:45:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=120d547d-0a0d-487f-8302-c325afa1b2e9 time="2018-11-02 12:46:56.355583" level=info msg="10.233.64.1 - - [02/Nov/2018:12:46:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=496eee66-5fad-417f-ae4b-fe72a443a7d1 time="2018-11-02 12:47:56.354818" level=info msg="10.233.64.1 - - [02/Nov/2018:12:47:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=8b67b7cc-c962-41e9-9c23-5b85bef50915 time="2018-11-02 12:48:22.841664" level=info msg="peer connected to store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:48:events.(*livenessWatcher).Watch]" time="2018-11-02 12:48:53.233603" level=info msg="peer disconnected from store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:51:events.(*livenessWatcher).Watch]" time="2018-11-02 12:48:56.356002" level=info msg="10.233.64.1 - - [02/Nov/2018:12:48:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=0ff96806-ffde-4fd9-a60e-bb679dca7244 time="2018-11-02 12:49:56.356170" level=info msg="10.233.64.1 - - [02/Nov/2018:12:49:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=8e9afeb4-4a86-435d-982b-bd1b1a4a7a72 time="2018-11-02 12:50:09.888664" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:09 +0000] "GET /v1/volumes HTTP/1.1" 200 1387" reqid=55c21afd-5db6-45cb-b0ae-eb28453dbc54 time="2018-11-02 12:50:09.908669" level=error msg="failed RPC call" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused"" remotepeer="d640f3ab-cd90-4670-8d85-5871316475be(gluster-kube1-0)" reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 rpc=TxnSvc.RunStep source="[rpc-client.go:72:transaction.runStepOn]" txnid=67d79ffb-13f9-40c1-b55e-33f52a966bd3 time="2018-11-02 12:50:09.908882" level=error msg="Step failed on node." error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused"" node=d640f3ab-cd90-4670-8d85-5871316475be reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 source="[step.go:119:transaction.runStepFuncOnNodes]" step=bricks-status.Check txnid=67d79ffb-13f9-40c1-b55e-33f52a966bd3 time="2018-11-02 12:50:09.935845" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:09 +0000] "GET /v1/volumes/pvc-30cc1447de8611e8/bricks HTTP/1.1" 200 2084" reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 time="2018-11-02 12:50:23.146792" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:23 +0000] "GET /v1/volumes HTTP/1.1" 200 1387" reqid=35a3c2d3-d305-4787-9187-cc7906a4857e time="2018-11-02 12:50:23.167456" level=error msg="failed RPC call" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused"" remotepeer="d640f3ab-cd90-4670-8d85-5871316475be(gluster-kube1-0)" reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 rpc=TxnSvc.RunStep source="[rpc-client.go:72:transaction.runStepOn]" txnid=facdbb90-46cd-4949-b39c-a19f9eb453bc time="2018-11-02 12:50:23.167626" level=error msg="Step failed on node." error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused"" node=d640f3ab-cd90-4670-8d85-5871316475be reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 source="[step.go:119:transaction.runStepFuncOnNodes]" step=bricks-status.Check txnid=facdbb90-46cd-4949-b39c-a19f9eb453bc time="2018-11-02 12:50:23.195260" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:23 +0000] "GET /v1/volumes/pvc-30cc1447de8611e8/bricks HTTP/1.1" 200 2084" reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 time="2018-11-02 12:50:56.355403" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:56 +0000] "GET /ping HTTP/1.1" 200 0" reqid=c49009f7-acdf-4478-b03e-7153d83aa578 time="2018-11-02 12:51:17.533562" level=info msg="peer connected to store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:48:events.(*livenessWatcher).Watch]" time="2018-11-02 12:51:17.789189" level=info msg="client connected" address="10.233.66.74:1023" server=sunrpc source="[server.go:155:sunrpc.(*SunRPC).acceptLoop]" transport=tcp time="2018-11-02 12:51:17.793524" level=info msg="client disconnected" address="10.233.66.74:1023" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"

rmadaka avatar Nov 02 '18 12:11 rmadaka

Providing output one more time:

[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true   | 43980 |  56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false  |     0 |   0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true   | 40826 |  57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true   | 43980 |  56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false  |     0 |   0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true   | 40826 |  57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
Error getting volume status

Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.9.177:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube2-0 /]# vi /var/log/glusterd2/glusterd2.log 
[root@gluster-kube2-0 /]# vi /var/log/glusterd2/glusterd2.log 
[root@gluster-kube2-0 /]# 
[root@gluster-kube2-0 /]# 
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true   | 43980 |  56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false  |     0 |   0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true   | 40826 |  57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true   | 40826 |  57 |
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true   | 43980 |  56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false  |     0 |   0 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+

rmadaka avatar Nov 02 '18 12:11 rmadaka

@vpandey-RH any update on this one?

Madhu-1 avatar Nov 05 '18 05:11 Madhu-1

Could this be related to #1054 ?

atinmu avatar Nov 13 '18 11:11 atinmu

@vpandey-RH Have we made any progress on this issue?

atinmu avatar Nov 21 '18 04:11 atinmu

@atinmu Not yet. I have not been able to give time to this issue. Will work on it.

vpandey-RH avatar Nov 21 '18 05:11 vpandey-RH

@rmadaka is this still valid with latest master?

atinmu avatar Dec 02 '18 08:12 atinmu