glusterd2 icon indicating copy to clipboard operation
glusterd2 copied to clipboard

Bricks are not running and volume stop failed

Open Shrivaibavi opened this issue 6 years ago • 6 comments

Observed behavior

[root@dhcp35-30 ~]# glustercli volume list
+--------------------------------------+------+-----------------------+---------+-----------+--------+
|                  ID                  | NAME |         TYPE          |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
| e42c6604-6cf7-4aac-bd5e-dfe0cd674d4f | abc  | Replicate             | Stopped | tcp       | 3      |
| dd797f61-0c20-4894-8447-9b734e21f63b | dif  | Replicate             | Stopped | tcp       | 3      |
| 7c0b0f49-9c61-4641-b688-c0a478924ba9 | xy   | Distributed-Replicate | Started | tcp       | 6      |
| e4f4ba5b-89c9-4b53-a2d3-5e7e6935804a | xyz  | Distributed-Replicate | Started | tcp       | 6      |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# clear

[root@dhcp35-30 ~]# glustercli volume list
+--------------------------------------+------+-----------------------+---------+-----------+--------+
|                  ID                  | NAME |         TYPE          |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
| e42c6604-6cf7-4aac-bd5e-dfe0cd674d4f | abc  | Replicate             | Stopped | tcp       | 3      |
| dd797f61-0c20-4894-8447-9b734e21f63b | dif  | Replicate             | Stopped | tcp       | 3      |
| 7c0b0f49-9c61-4641-b688-c0a478924ba9 | xy   | Distributed-Replicate | Started | tcp       | 6      |
| e4f4ba5b-89c9-4b53-a2d3-5e7e6935804a | xyz  | Distributed-Replicate | Started | tcp       | 6      |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume start abc
Volume abc started successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume start dif
Volume dif started successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume stop dif
Volume dif stopped successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume stop abc
Volume abc stopped successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume delete abc
Are you sure you want to delete volume abc [yes/no]? yes
Volume abc deleted successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume delete dif
Are you sure you want to delete volume dif [yes/no]? yes
Volume dif deleted successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume stop xy
Volume xy stopped successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume stop xyz
Volume stop failed

Failed to connect to glusterd. Please check if
- Glusterd is running(http://127.0.0.1:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume list
+--------------------------------------+------+-----------------------+---------+-----------+--------+
|                  ID                  | NAME |         TYPE          |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
| 7c0b0f49-9c61-4641-b688-c0a478924ba9 | xy   | Distributed-Replicate | Stopped | tcp       | 6      |
| e4f4ba5b-89c9-4b53-a2d3-5e7e6935804a | xyz  | Distributed-Replicate | Started | tcp       | 6      |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# systemctl status glusterd2
● glusterd2.service - GlusterD2, the management service for GlusterFS (pre-release)
   Loaded: loaded (/usr/lib/systemd/system/glusterd2.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-12-19 16:50:26 IST; 16min ago
 Main PID: 4022 (glusterd2)
   CGroup: /system.slice/glusterd2.service
           ├─ 4022 /usr/sbin/glusterd2 --config=/etc/glusterd2/glusterd2.toml
           ├─15182 /usr/sbin/glusterfs -s localhost --volfile-server-port 24007 --volfile-id gluster/...
           └─20932 /usr/sbin/glusterfsd --volfile-server 127.0.0.1 --volfile-server-port 24007 --volf...

Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: dlfcn 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: libpthread 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: llistxattr 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: setfsid 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: spinlock 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: epoll.h 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: xattr.h 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: st_atim.tv_nsec 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: package-string: glusterfs...
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: ---------
Hint: Some lines were ellipsized, use -l to show in full.
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume status xyz
Volume : xyz
+--------------------------------------+--------------+---------------------+--------+------+-----+
|               BRICK ID               |     HOST     |        PATH         | ONLINE | PORT | PID |
+--------------------------------------+--------------+---------------------+--------+------+-----+
| cc435430-91cb-4967-9743-ee11a0c28597 | 10.70.35.240 | /bricks/brick1/xyz2 | false  |    0 |   0 |
| c588e555-8c45-4d91-937c-d468cfd84a94 | 10.70.35.30  | /bricks/brick1/xyz3 | false  |    0 |   0 |
| 7454a53e-2fad-48b4-9933-137ff0845df8 | 10.70.35.106 | /bricks/brick1/xyz4 | false  |    0 |   0 |
| d472b3ae-5ddb-49d3-b435-826415a54dd4 | 10.70.35.240 | /bricks/brick1/xyz5 | false  |    0 |   0 |
| 572dfadc-6f79-4460-b356-549677d035ca | 10.70.35.30  | /bricks/brick1/xyz0 | false  |    0 |   0 |
| c7012149-5658-4e97-aebe-1ecdd3b093e7 | 10.70.35.106 | /bricks/brick1/xyz1 | false  |    0 |   0 |
+--------------------------------------+--------------+---------------------+--------+------+-----+
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# curl -i -XPOST http://localhost:24007/v1/volumes/xyz/stop
HTTP/1.1 500 Internal Server Error
Content-Type: application/json; charset=UTF-8
X-Gluster-Cluster-Id: 8aa6daa1-3d77-4df3-a938-115f5797fd2a
X-Gluster-Peer-Id: 2115479a-c493-4b44-9119-aa78b0dfcd5e
X-Request-Id: 17a34195-7c07-4f6d-bb71-52e699918a4a
Date: Wed, 19 Dec 2018 11:40:33 GMT
Content-Length: 686

{"errors":[{"code":2,"message":"a txn step failed","fields":{"error":"dial unix /var/run/glusterd2/62d8873b75178964.socket: connect: connection refused","peer-id":"2115479a-c493-4b44-9119-aa78b0dfcd5e","step":"vol-stop.StopBricks"}},{"code":2,"message":"a txn step failed","fields":{"error":"dial unix /var/run/glusterd2/2733bc25947ab39f.socket: connect: connection refused","peer-id":"3aa60137-4a3b-4c88-8ff7-4b4bf5fa87d4","step":"vol-stop.StopBricks"}},{"code":2,"message":"a txn step failed","fields":{"error":"dial unix /var/run/glusterd2/e51b66a88b708c4d.socket: connect: no such file or directory","peer-id":"2423abfc-db94-4974-96cc-3d3af3e36753","step":"vol-stop.StopBricks"}}]}

Expected/desired behavior

The volume stop should be successful

Details on how to reproduce (minimal and precise)

Information about the environment:

  • Glusterd2 version used (e.g. v4.1.0 or master): glusterd version: v6.0-dev.88.gitea22407 git SHA: ea22407 go version: go1.11.2 go OS/arch: linux/amd64

  • Operating system used:

  • Glusterd2 compiled from sources, as a package (rpm/deb), or container:

  • Using External ETCD: (yes/no, if yes ETCD version):

  • If container, which container image:

  • Using kubernetes, openshift, or direct install:

  • If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside:

Other useful information

  • glusterd2 config files from all nodes (default /etc/glusterd2/glusterd2.toml)

[root@dhcp35-30 ~]# cat /etc/glusterd2/glusterd2.toml

localstatedir = "/var/lib/glusterd2" logdir = "/var/log/glusterd2" logfile = "glusterd2.log" loglevel = "INFO" rundir = "/var/run/glusterd2" defaultpeerport = "24008" peeraddress = ":24008" clientaddress = ":24007" restauth = false etcdendpoints = "http://10.70.35.173:2379" noembed = true

  • glusterd2 log files from all nodes (default /var/log/glusterd2/glusterd2.log)
time="2018-12-19 12:38:37.191206" level=info msg="client connected" address="10.70.35.106:719" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:38:37.194308" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/bricks/brick1/xy0 error="SearchByBrickPath: port for brick /bricks/brick1/xy0 not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:38:37.195592" level=info msg="client disconnected" address="10.70.35.106:719" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:38:37.202375" level=info msg="client connected" address="10.70.35.106:717" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:38:37.203989" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/bricks/brick1/xy3 error="SearchByBrickPath: port for brick /bricks/brick1/xy3 not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:38:37.204598" level=info msg="client disconnected" address="10.70.35.106:717" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:38:37.219186" level=info msg="client connected" address="10.70.35.106:714" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:38:37.222138" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/bricks/brick0/abc1 error="SearchByBrickPath: port for brick /bricks/brick0/abc1 not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:38:37.223205" level=info msg="client disconnected" address="10.70.35.106:714" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:38:37.225297" level=info msg="client connected" address="10.70.35.106:713" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:38:37.226908" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/bricks/brick0/abc1 error="SearchByBrickPath: port for brick /bricks/brick0/abc1 not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:38:37.227513" level=info msg="client disconnected" address="10.70.35.106:713" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:38:37.528458" level=info msg="client connected" address="10.70.35.30:1006" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:38:37.529887" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/bricks/brick1/xy0 error="SearchByBrickPath: port for brick /bricks/brick1/xy0 not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:38:37.530877" level=info msg="client disconnected" address="10.70.35.30:1006" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:38:37.533653" level=info msg="client connected" address="10.70.35.30:970" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:38:37.534826" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/bricks/brick1/xy3 error="SearchByBrickPath: port for brick /bricks/brick1/xy3 not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:38:37.535109" level=info msg="client disconnected" address="10.70.35.30:970" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:38:37.551247" level=info msg="client connected" address="10.70.35.30:855" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:38:37.554776" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/bricks/brick0/abc1 error="SearchByBrickPath: port for brick /bricks/brick0/abc1 not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:38:37.555268" level=info msg="client disconnected" address="10.70.35.30:855" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:38:37.555734" level=info msg="client connected" address="10.70.35.30:854" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:38:37.556912" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/bricks/brick0/abc1 error="SearchByBrickPath: port for brick /bricks/brick0/abc1 not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:38:37.557201" level=info msg="client disconnected" address="10.70.35.30:854" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:38:38.094089" level=info msg="client connected" address="10.70.35.240:701" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:38:38.097453" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/bricks/brick0/dif error="SearchByBrickPath: port for brick /bricks/brick0/dif not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:38:38.098036" level=info msg="client disconnected" address="10.70.35.240:701" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:38:38.599291" level=info msg="client connected" address="10.70.35.30:845" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:38:38.600734" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/bricks/brick0/dif error="SearchByBrickPath: port for brick /bricks/brick0/dif not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:38:38.601041" level=info msg="client disconnected" address="10.70.35.30:845" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:38:39.277770" level=info msg="client connected" address="10.70.35.106:1009" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.553819" level=info msg="client connected" address="10.70.35.106:799" server=sun
rpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:41:58.555652" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/
bricks/brick2/dif error="SearchByBrickPath: port for brick /bricks/brick2/dif not found" source="[rpc_pro
g.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:41:58.556304" level=info msg="client disconnected" address="10.70.35.106:799" server=
sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.912431" level=info msg="client connected" address="10.70.35.30:996" server=sunr
pc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:41:58.915921" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/
bricks/brick1/xy2 error="SearchByBrickPath: port for brick /bricks/brick1/xy2 not found" source="[rpc_pro
g.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:41:58.916735" level=info msg="client disconnected" address="10.70.35.30:996" server=s
unrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.930666" level=info msg="client connected" address="10.70.35.30:980" server=sunr
pc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:41:58.932044" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/
bricks/brick1/xy5 error="SearchByBrickPath: port for brick /bricks/brick1/xy5 not found" source="[rpc_pro
g.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:41:58.932528" level=info msg="client disconnected" address="10.70.35.30:980" server=s
unrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.958360" level=info msg="client connected" address="10.70.35.30:942" server=sunr
pc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:41:58.960856" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/
bricks/brick0/abc3 error="SearchByBrickPath: port for brick /bricks/brick0/abc3 not found" source="[rpc_p
rog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:41:58.961836" level=info msg="client disconnected" address="10.70.35.30:942" server=s
unrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.962808" level=info msg="client connected" address="10.70.35.30:941" server=sunr
pc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:41:58.963972" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/
bricks/brick0/abc3 error="SearchByBrickPath: port for brick /bricks/brick0/abc3 not found" source="[rpc_p
rog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:41:58.964371" level=info msg="client disconnected" address="10.70.35.30:941" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.967270" level=info msg="client connected" address="10.70.35.30:940" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp

brick logs

[2018-12-19 11:20:37.730143] I [glusterfsd-mgmt.c:926:glusterfs_handle_attach] 0-glusterfs: got attach for /var/lib/glusterd2/volfiles/xy.3aa60137-4a3b-4c88-8ff7-4b4bf5fa87d4.bricks-brick1-xy5.vol
[2018-12-19 11:20:37.747910] I [socket.c:902:__socket_server_bind] 1-socket.xy-changelog: closing (AF_UNIX) reuse check socket 14
[2018-12-19 11:20:37.749565] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-12-19 11:20:38.436434] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy2: allowed = "*", received addr = "10.70.35.30"
[2018-12-19 11:20:38.436537] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:156bf9bf-bdab-4f36-8ed1-45e74d3b73fe-GRAPH_ID:6-PID:15182-HOST:dhcp35-30.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy2
[2018-12-19 11:20:38.444116] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy5: allowed = "*", received addr = "10.70.35.30"
[2018-12-19 11:20:38.444200] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:156bf9bf-bdab-4f36-8ed1-45e74d3b73fe-GRAPH_ID:6-PID:15182-HOST:dhcp35-30.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-1-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy5
[2018-12-19 11:20:39.643828] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy2: allowed = "*", received addr = "10.70.35.106"
[2018-12-19 11:20:39.643885] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:30bf137c-b074-483e-b287-d77d86765bab-GRAPH_ID:5-PID:2603-HOST:dhcp35-106.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy2
[2018-12-19 11:20:39.667300] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy5: allowed = "*", received addr = "10.70.35.106"
[2018-12-19 11:20:39.667343] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:30bf137c-b074-483e-b287-d77d86765bab-GRAPH_ID:5-PID:2603-HOST:dhcp35-106.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-1-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy5
[2018-12-19 11:20:43.427703] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy2: allowed = "*", received addr = "10.70.35.240"
[2018-12-19 11:20:43.427749] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:83af0436-c22e-4ee9-b596-364a3583c7b9-GRAPH_ID:6-PID:18906-HOST:dhcp35-240.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy2
[2018-12-19 11:20:43.463992] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy5: allowed = "*", received addr = "10.70.35.240"
[2018-12-19 11:20:43.464031] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:83af0436-c22e-4ee9-b596-364a3583c7b9-GRAPH_ID:6-PID:18906-HOST:dhcp35-240.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-1-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy5
[2018-12-19 11:36:31.979004] I [glusterfsd-mgmt.c:279:glusterfs_handle_terminate] 0-glusterfs: detaching not-only child /bricks/brick1/xy2
[2018-12-19 11:36:31.979293] I [server.c:1640:server_notify] 0-xy-server: disconnecting 10.70.35.30:959
[2018-12-19 11:36:31.979383] I [socket.c:811:__socket_shutdown] 0-tcp.xy-server: intentional socket shutdown(15)
[2018-12-19 11:36:31.979442] I [server.c:1640:server_notify] 0-xy-server: disconnecting 10.70.35.106:952
[2018-12-19 11:36:31.979494] I [socket.c:811:__socket_shutdown] 0-tcp.xy-server: intentional socket shutdown(17)
[2018-12-19 11:36:31.979511] I [server.c:1640:server_notify] 0-xy-server: disconnecting 10.70.35.240:955
[2018-12-19 11:36:31.979566] I [socket.c:811:__socket_shutdown] 0-tcp.xy-server: intentional socket shutdown(19)
[2018-12-19 11:36:31.979595] I [MSGID: 115036] [server.c:494:server_rpc_notify] 0-xy-server: disconnecting connection from CTX_ID:156bf9bf-bdab-4f36-8ed1-45e74d3b73fe-GRAPH_ID:6-PID:15182-HOST:dhcp35-30.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.979929] I [MSGID: 115036] [server.c:494:server_rpc_notify] 0-xy-server: disconnecting connection from CTX_ID:30bf137c-b074-483e-b287-d77d86765bab-GRAPH_ID:5-PID:2603-HOST:dhcp35-106.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.980088] I [MSGID: 115036] [server.c:494:server_rpc_notify] 0-xy-server: disconnecting connection from CTX_ID:83af0436-c22e-4ee9-b596-364a3583c7b9-GRAPH_ID:6-PID:18906-HOST:dhcp35-240.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.980613] I [MSGID: 101055] [client_t.c:435:gf_client_unref] 0-xy-server: Shutting down connection CTX_ID:83af0436-c22e-4ee9-b596-364a3583c7b9-GRAPH_ID:6-PID:18906-HOST:dhcp35-240.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.980922] I [MSGID: 101055] [client_t.c:435:gf_client_unref] 0-xy-server: Shutting down connection CTX_ID:30bf137c-b074-483e-b287-d77d86765bab-GRAPH_ID:5-PID:2603-HOST:dhcp35-106.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.980924] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2018-12-19 11:36:31.980997] I [server.c:408:server_call_xlator_mem_cleanup] 0-xy-server: Create graph janitor thread for brick /bricks/brick1/xy2
[2018-12-19 11:36:31.980613] I [MSGID: 101055] [client_t.c:435:gf_client_unref] 0-xy-server: Shutting down connection CTX_ID:156bf9bf-bdab-4f36-8ed1-45e74d3b73fe-GRAPH_ID:6-PID:15182-HOST:dhcp35-30.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.981155] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2018-12-19 11:36:31.981306] E [xlator.c:1432:glusterfs_delete_volfile_checksum] 0-xy-server: failed to get volfile checksum for volfile id xy.3aa60137-4a3b-4c88-8ff7-4b4bf5fa87d4.bricks-brick1-xy2.
[2018-12-19 11:36:31.981357] I [index.c:2604:notify] 0-xy-index: Notify GF_EVENT_PARENT_DOWN for brick /bricks/brick1/xy2
[2018-12-19 11:36:31.981471] I [io-threads.c:1312:notify] 0-xy-io-threads: Notify GF_EVENT_PARENT_DOWN for brick /bricks/brick1/xy2
[2018-12-19 11:36:31.981526] I [changelog.c:2022:notify] 0-xy-changelog: cleanup changelog rpc connection of brick /bricks/brick1/xy2
[2018-12-19 11:36:31.983198] I [socket.c:811:__socket_shutdown] 0-socket.xy-changelog: intentional socket shutdown(12)
[2018-12-19 11:36:31.983257] I [posix-common.c:158:posix_notify] 0-xy-posix: Sending CHILD_DOWN for brick /bricks/brick1/xy2
[2018-12-19 11:36:31.983389] I [server.c:1586:server_notify] 0-xy-server: Getting CHILD_DOWN event for brick /bricks/brick1/xy2
[2018-12-19 11:36:31.983427] I [server.c:617:server_graph_janitor_threads] 0-xy-server: Start call fini for brick /bricks/brick1/xy2 stack
[2018-12-19 11:36:31.983504] E [rpcsvc.c:1825:rpcsvc_get_listener] 0-rpc-service: invalid port for listener socket.xy-changelog
[2018-12-19 11:36:31.987658] I [barrier.c:665:fini] 0-xy-barrier: Disabling barriering and dequeuing all the queued fops
[2018-12-19 11:36:31.988311] I [io-stats.c:4023:fini] 0-/bricks/brick1/xy2: io-stats translator unloaded
[2018-12-19 11:36:32.183876] I [glusterfsd-mgmt.c:260:glusterfs_handle_terminate] 0-glusterfs: terminating after loss of last child /bricks/brick1/xy5
[2018-12-19 11:36:31.981428] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2018-12-19 11:36:32.184281] W [glusterfsd.c:1543:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fe27358ddd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x56159e6ac105] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6d) [0x56159e6abf5d] ) 0-: received signum (15), shutting down
  • ETCD configuration
  • Contents of uuid.toml from all nodes (default /var/lib/glusterd2/uuid.toml)
  • Output of statedump from any one of the node

Useful commands

  • To get glusterd2 version
    glusterd2 --version
    
  • To get ETCD version
    etcd --version
    
  • To get output of statedump
    curl http://glusterd2-IP:glusterd2-Port/statedump
    

Shrivaibavi avatar Dec 19 '18 12:12 Shrivaibavi

Can you paste brick logs of the time when you started volume "xyz" ?

rishubhjain avatar Dec 19 '18 13:12 rishubhjain

@rishubhjain The logs are quite big. you can use my machine instead if you want to debug. 10.70.35.30

Shrivaibavi avatar Dec 19 '18 14:12 Shrivaibavi

Two fix required.

  1. glustercli - Blindly catching "Connection Refused" error and displaying as glusterd2 is probably down. https://github.com/gluster/glusterd2/blob/master/glustercli/cmd/common.go#L43
  2. In all Brick ops, handle connection refused error and act accordingly. For example, while stopping the brick glusterd2 tries to connect and send brickop to stop, Ignore if connection refused error.

aravindavk avatar Dec 20 '18 03:12 aravindavk

@harigowtham What's the update on this?

atinmu avatar Jan 28 '19 12:01 atinmu

Unable to reproduce the issue. will try it a few more times. There is one odd error message "no such file or directory" is the error message at one of the node. Need to see deeper as to why this one is different.

harigowtham avatar Jan 28 '19 13:01 harigowtham

@harigowtham try to stop the volume immediately after glusterd2 restart(Before bricks connect back to glusterd2)

aravindavk avatar Jan 29 '19 03:01 aravindavk