glusterd2 icon indicating copy to clipboard operation
glusterd2 copied to clipboard

volume info shows the volume state as started but volume status shows it as offline

Open PrasadDesala opened this issue 5 years ago • 3 comments

Observed behavior

volume info shows the volume as started but volume status shows it as offline.

[root@gluster-kube1-0 bricks]# glustercli volume status pvc-46967f93-0e6e-11e9-af0b-525400f94cb8 Volume : pvc-46967f93-0e6e-11e9-af0b-525400f94cb8 +--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+------+-----+ | BRICK ID | HOST | PATH | ONLINE | PORT | PID | +--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+------+-----+ | 4112110f-4443-4d85-9bd8-1914efcee897 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick | false | 0 | 0 | | 6927f476-a7e8-40cd-8205-a7668d667ada | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick2/brick | false | 0 | 0 | | 2584d0c9-c6d7-4d3d-94b6-6b909fbf4b67 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick | false | 0 | 0 | +--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+------+-----+ [root@gluster-kube1-0 bricks]# glustercli volume info pvc-46967f93-0e6e-11e9-af0b-525400f94cb8

Volume Name: pvc-46967f93-0e6e-11e9-af0b-525400f94cb8 Type: Replicate Volume ID: d5f6bb15-8d2d-4b27-a700-96e86ad972e4 State: Started Capacity: 1.0 GiB Transport-type: tcp Options: performance/io-cache.io-cache: off performance/write-behind.write-behind: off cluster/replicate.self-heal-daemon: on debug/io-stats.count-fop-hits: on performance/open-behind.open-behind: off performance/quick-read.quick-read: off performance/read-ahead.read-ahead: off performance/readdir-ahead.readdir-ahead: off debug/io-stats.latency-measurement: on performance/md-cache.md-cache: off Number of Bricks: 3 Brick1: gluster-kube3-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick Brick2: gluster-kube2-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick2/brick Brick3: gluster-kube1-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick

And volume start on that volume is failing with response "Volume already started"

[root@gluster-kube1-0 bricks]# glustercli volume start pvc-46967f93-0e6e-11e9-af0b-525400f94cb8 volume start failed

Response headers: X-Gluster-Cluster-Id: 98ef5bef-583c-41ee-b594-50f3d4784679 X-Gluster-Peer-Id: 4e752b45-aa0a-4784-83f7-6b487e886b4d X-Request-Id: 16694478-f0d7-4488-a440-6f744e476bad

Response body: volume already started

Expected/desired behavior

volume status shows the volume as online.

Details on how to reproduce (minimal and precise)

  1. Create a 3 node gcs system using vagrant.
  2. with brick-mux enabled, create 100 pvc.
  3. Stop and start all the volumes. glustercli volume stop pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8 glustercli volume stop pvc-38485e48-0e6e-11e9-af0b-525400f94cb8 glustercli volume stop pvc-38853d16-0e6e-11e9-af0b-525400f94cb8 glustercli volume stop pvc-38c06414-0e6e-11e9-af0b-525400f94cb8 glustercli volume stop pvc-38f92094-0e6e-11e9-af0b-525400f94cb8 glustercli volume stop pvc-39337ec5-0e6e-11e9-af0b-525400f94cb8 glustercli volume stop pvc-396d7779-0e6e-11e9-af0b-525400f94cb8 .... .. . Same way start all the volumes. glustercli volume start pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8 glustercli volume start pvc-38485e48-0e6e-11e9-af0b-525400f94cb8 glustercli volume start pvc-38853d16-0e6e-11e9-af0b-525400f94cb8 glustercli volume start pvc-38c06414-0e6e-11e9-af0b-525400f94cb8 .. .

Information about the environment:

Glusterd2 version used (e.g. v4.1.0 or master): v6.0-dev.94.git601ba61 Operating system used: Centos 7.6 Glusterd2 compiled from sources, as a package (rpm/deb), or container: Using External ETCD: (yes/no, if yes ETCD version): yes; version 3.3.8 If container, which container image: Using kubernetes, openshift, or direct install: If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: Kubernetes

  • Output of statedump from any one of the node

PrasadDesala avatar Jan 02 '19 11:01 PrasadDesala

@PrasadDesala I will need more information than just volume status as there could be numerous reasons as to why the Port shows as 0. One of them could be that the bricks need some time to Sign In after glusterfsd has been spawned. Can you get me the output of - ''' ps aux | grep glusterfsd ''' after you start the PVCs.

Also, can you tell me if the volume stop and start requests are sequential. I mean the start requests are sent after all stop requests have been sent or the requests don't have a particular ordering ?

Also, After all start requests have been sent and returned successful, can you give the bricks sometime to sign in and check after that if you have the same observations ?

vpandey-RH avatar Jan 02 '19 12:01 vpandey-RH

@PrasadDesala I will need more information than just volume status as there could be numerous reasons as to why the Port shows as 0. One of them could be that the bricks need some time to Sign In after glusterfsd has been spawned. Can you get me the output of - ''' ps aux | grep glusterfsd ''' after you start the PVCs.

[root@gluster-kube1-0 bricks]# ps aux | grep glusterfsd root 8113 0.0 0.0 9088 672 pts/2 S+ 12:54 0:00 grep --color=auto glusterfsd root 21733 8.5 2.6 13825056 870656 ? Ssl 11:34 6:52 /usr/sbin/glusterfsd --volfile-server gluster-kube1-0.glusterd2.gcs --volfile-server-port 24007 --volfile-id pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8.4e752b45-aa0a-4784-83f7-6b487e886b4d.var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick -p /var/run/glusterd2/4e752b45-aa0a-4784-83f7-6b487e886b4d-var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.pid -S /var/run/glusterd2/3c5c17b3422e2a07.socket --brick-name /var/run/glusterd2/bricks/pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick -l /var/log/glusterd2/glusterfs/bricks/var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.log --xlator-option *-posix.glusterd-uuid=4e752b45-aa0a-4784-83f7-6b487e886b4d

Also, can you tell me if the volume stop and start requests are sequential. I mean the start requests are sent after all stop requests have been sent or the requests don't have a particular ordering ?

Volumes are started once all the stop requests are completed. Stop 100 volumes {1..100} --> wait till volume stop completes on all the volumes --> Start 100 volumes {1..100}

Also, After all start requests have been sent and returned successful, can you give the bricks sometime to sign in and check after that if you have the same observations ?

Its been more than an hour I hit this issue. Still the volume status shows as not online.

I see this error in glusterd2 logs when I tried to start the volume. path "/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick" is present though. [root@gluster-kube1-0 bricks]# ll /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick total 0

time="2019-01-02 11:49:10.525084" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick error="SearchByBrickPath: port for brick /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"

@vpandey-RH Let me know if you need any other information, I have the system in the same state.

PrasadDesala avatar Jan 02 '19 12:01 PrasadDesala

@vpandey-RH did we figure out what's the cause of this state?

atinmu avatar Jan 16 '19 11:01 atinmu