glusterdocs icon indicating copy to clipboard operation
glusterdocs copied to clipboard

Thin-Arbiter-Volumes.md has inaccurate and incomplete information.

Open alphabet5 opened this issue 4 years ago • 13 comments

  • The command 'glustercli' is used, which doesn't have any reference in docs until somewhere back in the changelog for v4.

  • The command to create a thin-arbiter volume file does not work. A file is not created, resulting in the following showing up in the logs.

[2020-12-12 22:52:47.880412] E [MSGID: 100009] [glusterfsd.c:633:get_volfp] 0-glusterfsd: loading volume file failed [{volume_file=/mnt/brick1/gvolume0/thin-arbiter.vol}, {errno=2}, {error=No such file or directory}]
  • The command to create a volume is incorrect.
glustercli volume create <volname> --replica 2 <host1>:<brick1> <host2>:<brick2> --thin-arbiter <quorum-host>:<path-to-store-replica-id-file>

Should be changed to:

gluster volume create <volname> replica 2 thin-arbiter 1 <host1>:<brick1> <host2>:<brick2> <quorum-host>:<path-to-store-replica-id-file>

alphabet5 avatar Dec 12 '20 23:12 alphabet5

@Sheetalpamecha @aspandey @itisravi @karthik-us can you please have a look when you get a chance... thanks.

amarts avatar Dec 13 '20 06:12 amarts

After some more digging, there is this: https://review.gluster.org/#/c/glusterfs/+/20056/

Which has a script to assist with configuring a service for the thin-arbiter process, as well as a template .vol file at glusterfs/extras/thin-arbiter/thin-arbiter/thin-arbiter.vol

It appears that you can't run a thin-arbiter on a node that is also running glusterd by default. (I was trying to test out 1 node from cluster2 being a thin-arbiter for cluster1)

A couple of things that I can't seem to find:

  • How can you view the status of the thin-arbiter? (gluster volume status doesn't show the thin-arbiter.)
  • Does the thin-arbiter need to be a peer?

It looks like the arbiter for 8.3 is not the same op-version as glusterfs?

# gluster peer probe arbiter
peer probe: failed: Peer arbiter does not support required op-version

alphabet5 avatar Dec 13 '20 18:12 alphabet5

@alphabet5

  • thin-arbiter process is supposed to be run on a node outside the gluster trusted storage pool (i.e. where there is no glusterd running). So it must not be a peer.
  • the script that you identified sets up the ta process as a systemd service which will auto start the process even if you kill it or it dies, so ideally there is no need to check its status. You can still ps aux|grep gluster on the thin arbiter node to find its pid.

itisravi avatar Dec 14 '20 05:12 itisravi

Thanks @itisravi. Is there a way to verify the status of the arbiter? If the arbiter is online, but unreachable from the cluster, how would I know?

I don't really want to take a brick offline to see if the arbiter still allows writes to the other brick. Is there another way to verify the arbiter status?

alphabet5 avatar Dec 15 '20 14:12 alphabet5

telnet <thin-arbiter-node> 24007

Ctl-]

amarts avatar Dec 15 '20 15:12 amarts

@amarts how does this verify that the arbiter is working?

# telnet arbiter 24007
Trying 192.168.1.254...
Connected to arbiter.
Escape character is '^]'.
^]

If I look at logs for the arbiter, it seems as though it might not be working, and I don't see how telnetting to the arbiter verifies its operational status.

[2020-12-15 15:22:31.615878] E [MSGID: 115001] [server-handshake.c:584:server_setvolume] 0-ta-server: Cannot authenticate client from CTX_ID:fe5e65be-0254-4e46-8a5c-fe7b8e453459-GRAPH_ID:0-PID:1495-HOST:server2-PC_NAME:gvolume0-ta-2-RECON_NO:-75028 8.3 because brick is not attached in graph [No such file or directory]

Even if you verify the service status:

root@arbiter:~# systemctl status thin-arbiter                                 ● thin-arbiter.service - GlusterFS, Thin-arbiter process to maintain quorum f>
     Loaded: loaded (/etc/systemd/system/thin-arbiter.service; enabled; vendo>
     Active: active (running) since Mon 2020-12-14 18:26:05 UTC; 20h ago
   Main PID: 9872 (glusterfsd)
     Memory: 1.0G
     CGroup: /system.slice/thin-arbiter.service
             └─9872 /usr/sbin/glusterfsd -N --volfile-id ta -f /mnt/brick1/gv>

Dec 14 18:26:05 arbiter systemd[1]: Started GlusterFS, Thin-arbiter process t>
lines 1-9/9 (END)

It doesn't validate that the thin-arbiter is operational.

alphabet5 avatar Dec 15 '20 15:12 alphabet5

To clarify; I'm thinking all of this information would be useful to have in Thin-Arbiter-Volumes.md.

If you want me to take a stab at a pull request, let me know.

I also haven't found an example for using setup-thin-arbiter.sh yet. I'm guessing something like cd /mnt/dir/thin-arbiter-dir && sudo /?/?/?/setup-thin-arbiter.sh

alphabet5 avatar Dec 15 '20 15:12 alphabet5

If the arbiter is online, but unreachable from the cluster, how would I know

It needs to be reachable only from the (fuse) clients and not the cluster. So if it is not connected to any of the bricks including the TA brick, the fuse mount logs will have messages like disconnected from distrep-client-0etc. Conversely upon an established connection , you will see Connected to distrep-client-0 etc. in the logs.

If you want me to take a stab at a pull request, let me know.

Sure go ahead.

I also haven't found an example for using setup-thin-arbiter.sh yet

Slide 23 of https://archive.fosdem.org/2020/schedule/event/sds_gluster_thin_arbiter/attachments/slides/4110/export/events/attachments/sds_gluster_thin_arbiter/slides/4110/gluster_thin_arbiter_fosdem_2020.pdf has an embedded demo, check it out!

itisravi avatar Dec 16 '20 04:12 itisravi

Is it possible to remove a thin-arbiter brick?

gluster volume remove-brick gvolume0 replica 2 thin-arbiter 1 arbiter:/mnt/brick1/gvolume0/thin-arbiter.vol force
wrong brick type: thin-arbiter, use <HOSTNAME>:<export-dir-abs-path>

Usage:
volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK> ... <start|stop|status|commit|force>

alphabet5 avatar Dec 27 '20 17:12 alphabet5

Per that slide deck, support for add/replace brick are on the todo list yet.

TODO

  • Support for add/replace-brick CLI:
    • convert existing replica 2/3/arbiter to TA volume.
    • replace brick for data-bricks and TA node.
  • Make reads aware of in-memory information about bad brick.
  • Fix reported bugs. �

alphabet5 avatar Dec 27 '20 17:12 alphabet5

on the todo list yet.

Yes @Sheetalpamecha is working on this via https://github.com/gluster/glusterfs/issues/1528

itisravi avatar Dec 28 '20 04:12 itisravi

I have fought last night with Thin Arbiter too. the MD file doesn't give ANY information about VOLUME_FILE, how to get this or create. Command to create volume with thin arbiter is still inaccurate.

And Even If I did my best to configure Thin-Arbiter correctly, I have no idea how verify that it works or not. And because here is no way how to reconfigure volume (https://github.com/gluster/glusterfs/issues/1528 is dead for now) then any fix in the future means get all data out form the volume and re-create it.

I think that this part of documentation needs significant improvements...

polachz-nxp avatar Jul 27 '23 07:07 polachz-nxp

Finally I found a way how-to make GlusterFS Thin Arbiter up and running, Here is my How-To

https://polach.me/posts/howto-setup-glusterfs-thin-arbiter-at-homelab/

Maybe it can save some time to others...

polachz avatar Aug 26 '23 08:08 polachz