glusterfs
glusterfs copied to clipboard
Cannot do geo-replication or brick addition between Glusterfs (v9.6) servers where one server is centos7 and the other is rocky8
Description of problem:
Using glusterfs9 on centos7 attempting to add brick to cluster or geo replicate with glusterfs9 on rocky8 (centos 8/rhel8) fails reliably (centos7 to centos7 or rocky8 to rocky8 works fine)
The exact command to reproduce the issue:
on geo rep node (slave) - centos7
[root@test-geo brick]# glusterd --version
glusterfs 9.6
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root@test-geo brick]# mkdir /root/brick
[root@test-geo brick]# gluster volume create test 10.6.2.53:/root/brick force
volume create: test: success: please start the volume to access data
root@test-geo brick]# gluster volume start test
volume start: test: success
[root@test-geo brick]# gluster volume status
Status of volume: test
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.6.2.53:/root/brick 49152 0 Y 25690
Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks
on master node : (rocky8)
[root@staging-glus9-1 ~]# gluster volume geo-replication staging-gluster-store 10.6.2.53::test create push-pem force
Creating geo-replication session between staging-gluster-store & 10.6.2.53::test has been successful
[root@staging-glus9-1 ~]# gluster volume geo-replication staging-gluster-store 10.6.2.53::test start
Starting geo-replication session between staging-gluster-store & 10.6.2.53::test has been successful
[root@staging-glus9-1 ~]# gluster volume geo-replication status
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.6.2.225 staging-gluster-store /mnt/staging-gluster1-store/brick root ssh://10.6.2.53::test Passive N/A N/A
10.6.2.228 staging-gluster-store /mnt/staging-gluster2-store/brick root ssh://10.6.2.53::test N/A Faulty N/A N/A
The full output of the command that failed:
[2022-08-26 13:59:12.590336 +0000] I [MSGID: 106316] [glusterd-geo-rep.c:3312:glusterd_op_stage_gsync_create] 0-mana
gement: 10.6.2.53::test is not a valid slave volume. Error: Total disk size of master is greater than disk size of s
lave.
Total available size of master is greater than available size of slave. Force creating geo-rep session.
[2022-08-26 13:59:12.590430 +0000] W [MSGID: 106028] [glusterd-geo-rep.c:2728:glusterd_get_statefile_name] 0-managem
ent: Config file (/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf) missing. Looki
ng for template config file (/var/lib/glusterd/geo-replication/gsyncd_template.conf) [No such file or directory]
[2022-08-26 13:59:12.590449 +0000] I [MSGID: 106294] [glusterd-geo-rep.c:2738:glusterd_get_statefile_name] 0-managem
ent: Using default config template(/var/lib/glusterd/geo-replication/gsyncd_template.conf).
[2022-08-26 13:59:15.924574 +0000] I [run.c:242:runner_log] (-->/usr/lib64/glusterfs/9.6/xlator/mgmt/glusterd.so(+0x
eb736) [0x7f0f41c33736] -->/usr/lib64/glusterfs/9.6/xlator/mgmt/glusterd.so(+0xeb0b6) [0x7f0f41c330b6] -->/lib64/lib
glusterfs.so.0(runner_log+0x115) [0x7f0f562b8925] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/gsync-create
/post/S56glusterd-geo-rep-create-post.sh --volname=staging-gluster-store is_push_pem=1,pub_file=/var/lib/glusterd/ge
o-replication/common_secret.pem.pub,slave_user=root,slave_ip=10.6.2.53,slave_vol=test,ssh_port=22
[2022-08-26 13:59:20.610811 +0000] I [MSGID: 106327] [glusterd-geo-rep.c:2722:glusterd_get_statefile_name] 0-managem
ent: Using passed config template(/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf
).
[2022-08-26 13:59:21.285258 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-gluster
d: Dict get failed [{Key=gsync-status}, {errno=9}, {error=Bad file descriptor}]
[2022-08-26 13:59:21.472660 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-gluster
d: Dict get failed [{Key=gsync-status}, {errno=61}, {error=No data available}]
[2022-08-26 13:59:23.089805 +0000] I [MSGID: 106496] [glusterd-handshake.c:969:__server_getspec] 0-management: Recei
ved mount request for volume staging-gluster-store
[2022-08-26 13:59:26.202468 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:332:__glusterd_handle_gsync_set] 0-manageme
nt: master not found, while handling geo-replication options
[2022-08-26 13:59:26.202493 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:339:__glusterd_handle_gsync_set] 0-manageme
nt: slave not found, while handling geo-replication options
[2022-08-26 13:59:26.202528 +0000] W [MSGID: 106061] [glusterd-geo-rep.c:2413:glusterd_op_gsync_args_get] 0-manageme
nt: master not found
[2022-08-26 13:59:26.202541 +0000] E [MSGID: 106061] [glusterd-server-quorum.c:93:glusterd_validate_quorum] 0-manage
ment: Dict get failed [{Key=volname}]
[2022-08-26 13:59:26.310960 +0000] I [MSGID: 106327] [glusterd-geo-rep.c:4691:glusterd_read_status_file] 0-managemen
t: Using passed config template(/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf).
[2022-08-26 13:59:26.482522 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-gluster
d: Dict get failed [{Key=gsync-status}]
[2022-08-26 13:59:26.767654 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-gluster
d: Dict get failed [{Key=gsync-status}]
[2022-08-26 13:59:29.025620 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:332:__glusterd_handle_gsync_set] 0-manageme
nt: master not found, while handling geo-replication options
[2022-08-26 13:59:29.025662 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:339:__glusterd_handle_gsync_set] 0-manageme
nt: slave not found, while handling geo-replication options
[2022-08-26 13:59:29.025695 +0000] W [MSGID: 106061] [glusterd-geo-rep.c:2413:glusterd_op_gsync_args_get] 0-manageme
nt: master not found
[2022-08-26 13:59:29.025709 +0000] E [MSGID: 106061] [glusterd-server-quorum.c:93:glusterd_validate_quorum] 0-manage
ment: Dict get failed [{Key=volname}]
[2022-08-26 13:59:29.136745 +0000] I [MSGID: 106327] [glusterd-geo-rep.c:4691:glusterd_read_status_file] 0-management: Using passed config template(/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf).
[2022-08-26 13:59:29.307944 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-glusterd: Dict get failed [{Key=gsync-status}]
[2022-08-26 13:59:29.597878 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-glusterd: Dict get failed [{Key=gsync-status}]
[2022-08-26 13:59:31.935850 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:332:__glusterd_handle_gsync_set] 0-management: master not found, while handling geo-replication options
[2022-08-26 13:59:31.935879 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:339:__glusterd_handle_gsync_set] 0-management: slave not found, while handling geo-replication options
[2022-08-26 13:59:31.935913 +0000] W [MSGID: 106061] [glusterd-geo-rep.c:2413:glusterd_op_gsync_args_get] 0-management: master not found
[2022-08-26 13:59:31.935925 +0000] E [MSGID: 106061] [glusterd-server-quorum.c:93:glusterd_validate_quorum] 0-management: Dict get failed [{Key=volname}]
[2022-08-26 13:59:32.043509 +0000] I [MSGID: 106327] [glusterd-geo-rep.c:4691:glusterd_read_status_file] 0-management: Using passed config template(/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf).
Expected results:
The geo rep should start off and transfer data across (works fine rocky8 to rocky8 and centos7 to centos7 on this version)
Mandatory info:
- The output of the gluster volume info
command:
master :
[root@staging-glus9-1 ~]# gluster volume info
Volume Name: staging-gluster-archive Type: Replicate Volume ID: 7022b18f-9393-4dab-8df9-c985d6d54ef3 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.6.2.228:/mnt/staging-gluster2-archive/brick Brick2: 10.6.2.225:/mnt/staging-gluster1-archive/brick Options Reconfigured: features.read-only: off network.ping-timeout: 5 transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off
Volume Name: staging-gluster-store Type: Replicate Volume ID: 52d36274-ddf2-4af7-a633-829e9607924f Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.6.2.228:/mnt/staging-gluster2-store/brick Brick2: 10.6.2.225:/mnt/staging-gluster1-store/brick Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on features.read-only: off network.ping-timeout: 5 transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off
[root@staging-glus9-1 ~]# gluster volume status Status of volume: staging-gluster-archive Gluster process TCP Port RDMA Port Online Pid
Brick 10.6.2.228:/mnt/staging-gluster2-arch ive/brick 49152 0 Y 3402 Brick 10.6.2.225:/mnt/staging-gluster1-arch ive/brick 49152 0 Y 1211 Self-heal Daemon on localhost N/A N/A Y 1523 Self-heal Daemon on staging-glus9-2 N/A N/A Y 3757
Task Status of Volume staging-gluster-archive
There are no active volume tasks
Status of volume: staging-gluster-store Gluster process TCP Port RDMA Port Online Pid
Brick 10.6.2.228:/mnt/staging-gluster2-stor e/brick 49153 0 Y 3599 Brick 10.6.2.225:/mnt/staging-gluster1-stor e/brick 49153 0 Y 1380 Self-heal Daemon on localhost N/A N/A Y 1523 Self-heal Daemon on staging-glus9-2 N/A N/A Y 3757
Task Status of Volume staging-gluster-store
There are no active volume tasks
geo rep slave :
[root@test-geo brick]# gluster volume info
Volume Name: test Type: Distribute Volume ID: 63857233-05b9-4428-9265-e767dc06a1bb Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.6.2.53:/root/brick Options Reconfigured: features.read-only: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on
[root@test-geo brick]# gluster volume status Status of volume: test Gluster process TCP Port RDMA Port Online Pid
Brick 10.6.2.53:/root/brick 49152 0 Y 25690
Task Status of Volume test
There are no active volume tasks
- The output of the gluster volume heal
command:
**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/
**- Is there any crash ? Provide the backtrace and coredump
Additional info:
- The operating system / glusterfs version: master : Rocky Linux release 8.6 (Green Obsidian) / Linux staging-glus9-1 4.18.0-372.19.1.el8_6.x86_64 #1 SMP Tue Aug 2 16:19:42 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux / glusterfs 9.6
slave / geo rep target : CentOS Linux release 7.9.2009 (Core) / Linux test-geo 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 16:57:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux / glusterfs 9.6
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration
Can you check if GlusterFS version is the same on Rocky Linux (I think it should be version 10.2). If that's the case, I dont know if it's related, but Gluster 10 introduced some changes on the output of some commands related to geo-replication and maybe one node is not able to parse the output produced by other version. Try to run the command "gluster volume geo-replication status --xml" on your different working clusters (Centos7 -> CentOS7 and RockyLinux 8 -> RockyLinux 8) and tell us if the name of all XML elements are the same.
Both nodes are running gluster 9.6 (one on rocky8 and the other on centos7) - sorry if I didn't make that clear
This is the thing that causes me the most concern - I really woudn't expect interoperability issues on the same version (irrespective of platform)
Output of gluster volume geo-replication status --xml on master node :
gluster volume geo-replication status --xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
<opRet>0</opRet>
<opErrno>0</opErrno>
<opErrstr>master not found</opErrstr>
<geoRep>
<volume>
<name>staging-gluster-store</name>
<sessions>
<session>
<session_slave>77cf1069-a4ff-44ed-ac82-44cad2054c62:ssh://10.6.2.53::test:63857233-05b9-4428-9265-e767dc06a1bb</session_slave>
<pair>
<master_node>10.6.2.225</master_node>
<master_brick>/mnt/staging-gluster1-store/brick</master_brick>
<slave_user>root</slave_user>
<slave>ssh://10.6.2.53::test</slave>
<slave_node>N/A</slave_node>
<status>Faulty</status>
<crawl_status>N/A</crawl_status>
<entry>N/A</entry>
<data>N/A</data>
<meta>N/A</meta>
<failures>N/A</failures>
<checkpoint_completed>N/A</checkpoint_completed>
<master_node_uuid>77cf1069-a4ff-44ed-ac82-44cad2054c62</master_node_uuid>
<last_synced>N/A</last_synced>
<checkpoint_time>N/A</checkpoint_time>
<checkpoint_completion_time>N/A</checkpoint_completion_time>
</pair>
<pair>
<master_node>10.6.2.228</master_node>
<master_brick>/mnt/staging-gluster2-store/brick</master_brick>
<slave_user>root</slave_user>
<slave>ssh://10.6.2.53::test</slave>
<slave_node>N/A</slave_node>
<status>Faulty</status>
<crawl_status>N/A</crawl_status>
<entry>N/A</entry>
<data>N/A</data>
<meta>N/A</meta>
<failures>N/A</failures>
<checkpoint_completed>N/A</checkpoint_completed>
<master_node_uuid>a83a7f51-f93c-4af2-9353-aff16cbb615c</master_node_uuid>
<last_synced>N/A</last_synced>
<checkpoint_time>N/A</checkpoint_time>
<checkpoint_completion_time>N/A</checkpoint_completion_time>
</pair>
</session>
</sessions>
</volume>
</geoRep>
</cliOutput>
The above cannot be run on the slave node so not sure how useful it is, but the target node (slave) volume status is :
gluster volume status --xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
<opRet>0</opRet>
<opErrno>0</opErrno>
<opErrstr/>
<volStatus>
<volumes>
<volume>
<volName>test</volName>
<nodeCount>1</nodeCount>
<node>
<hostname>10.6.2.53</hostname>
<path>/root/brick</path>
<peerid>6c24cb21-c1d2-43ea-917c-63b02e1afe8a</peerid>
<status>1</status>
<port>49152</port>
<ports>
<tcp>49152</tcp>
<rdma>N/A</rdma>
</ports>
<pid>25690</pid>
</node>
<tasks/>
</volume>
</volumes>
</volStatus>
</cliOutput>
Any chance of an update on this please. Geo replication is an important part of our DR plan. If I can't progress this I may need to revert to glusterfs version 7
Please check /var/log/glusterfs/geo-replication/<primary_ip_secondary>/gsyncd.log on both primary and secondary node, it might give you some clue.
Have you tried add brick on a running geo-rep session?
Thanks for your attention on this, apologies for the delay in responding
I deleted the geo replication session and started again with it - these were the resulting logs following a push-pem and start of the new geo replication session
Master log (Rocky8) :
[2022-09-14 13:29:13.646251] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}]
[2022-09-14 13:29:13.646447] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/mnt/staging-gluster1-store/brick}, {slave_node=10.6.2.53}]
[2022-09-14 13:29:13.725821] I [resource(worker /mnt/staging-gluster1-store/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave...
[2022-09-14 13:30:13.659664] I [monitor(monitor):241:monitor] Monitor: Worker not confirmed after wait, aborting it. Gsyncd invocation on remote slave via SSH or gluster master mount might have hung. Please check the above logs for exact issue and check master or slave volume for errors. Restarting master/slave volume accordingly might help. [{brick=/mnt/staging-gluster1-store/brick}, {timeout=60}]
[2022-09-14 13:30:13.664363] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}]
[2022-09-14 13:30:23.677638] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}]
[2022-09-14 13:30:23.677780] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/mnt/staging-gluster1-store/brick}, {slave_node=10.6.2.53}]
[2022-09-14 13:30:23.752400] I [resource(worker /mnt/staging-gluster1-store/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave...
[2022-09-14 13:30:25.169637] I [resource(worker /mnt/staging-gluster1-store/brick):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.4170}]
[2022-09-14 13:30:25.169842] I [resource(worker /mnt/staging-gluster1-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:30:26.203054] I [resource(worker /mnt/staging-gluster1-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.0331}]
[2022-09-14 13:30:26.203222] I [subcmds(worker /mnt/staging-gluster1-store/brick):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2022-09-14 13:30:28.215208] I [master(worker /mnt/staging-gluster1-store/brick):1662:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/staging-gluster-store_10.6.2.53_test/mnt-staging-gluster1-store-brick}]
[2022-09-14 13:30:28.215479] I [resource(worker /mnt/staging-gluster1-store/brick):1292:service_loop] GLUSTER: Register time [{time=1663162228}]
[2022-09-14 13:30:28.227694] I [gsyncdstatus(worker /mnt/staging-gluster1-store/brick):287:set_passive] GeorepStatus: Worker Status Change [{status=Passive}]
[2022-09-14 13:30:31.592967] E [syncdutils(worker /mnt/staging-gluster1-store/brick):325:log_raise_exception] <top>: connection to peer is broken
[2022-09-14 13:30:31.755902] I [subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor Status Change [{status=Stopped}]
[2022-09-14 13:30:41.88936] I [subcmds(delete):173:subcmd_delete] <top>: geo-replication delete
[2022-09-14 13:30:50.604513] W [gsyncd(config-get):299:main] <top>: Session config file not exists, using the default config [{path=/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf}]
[2022-09-14 13:30:52.534513] I [subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor Status Change [{status=Created}]
[2022-09-14 13:32:07.743417] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}]
[2022-09-14 13:32:07.743619] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/mnt/staging-gluster1-store/brick}, {slave_node=10.6.2.53}]
[2022-09-14 13:32:07.815019] I [resource(worker /mnt/staging-gluster1-store/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave...
[2022-09-14 13:32:09.219334] I [resource(worker /mnt/staging-gluster1-store/brick):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.4041}]
[2022-09-14 13:32:09.219643] I [resource(worker /mnt/staging-gluster1-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:32:10.249689] I [resource(worker /mnt/staging-gluster1-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.0299}]
[2022-09-14 13:32:10.249983] I [subcmds(worker /mnt/staging-gluster1-store/brick):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2022-09-14 13:32:12.259141] I [master(worker /mnt/staging-gluster1-store/brick):1662:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/staging-gluster-store_10.6.2.53_test/mnt-staging-gluster1-store-brick}]
[2022-09-14 13:32:12.259487] I [resource(worker /mnt/staging-gluster1-store/brick):1292:service_loop] GLUSTER: Register time [{time=1663162332}]
[2022-09-14 13:32:12.268468] I [gsyncdstatus(worker /mnt/staging-gluster1-store/brick):287:set_passive] GeorepStatus: Worker Status Change [{status=Passive}]
Slave Log (Centos7) :
[2022-09-14 13:30:17.987849] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:30:19.126813] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1387}]
[2022-09-14 13:30:19.127486] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:30:19.127985] I [repce(slave 10.6.2.225/mnt/staging-gluster1-store/brick):96:service_loop] RepceServer: terminating on reaching EOF.
[2022-09-14 13:30:24.29012] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:30:25.164525] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1353}]
[2022-09-14 13:30:25.165018] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:30:31.598059] I [repce(slave 10.6.2.225/mnt/staging-gluster1-store/brick):96:service_loop] RepceServer: terminating on reaching EOF.
[2022-09-14 13:32:08.79457] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:32:08.375463] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:32:09.214279] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1347}]
[2022-09-14 13:32:09.214703] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:32:09.506212] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1306}]
[2022-09-14 13:32:09.506655] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:32:18.750761] E [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):121:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 578, in entry_ops
e['mode'], e['uid'], e['gid'])
File "/usr/libexec/glusterfs/python/syncdaemon/py2py3.py", line 176, in entry_pack_mkdir
stat.S_IMODE(mo), umask())
error: argument for 's' must be a string
[2022-09-14 13:32:18.776757] I [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):96:service_loop] RepceServer: terminating on reaching EOF.
[2022-09-14 13:32:29.949736] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:32:31.83671] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1338}]
[2022-09-14 13:32:31.84393] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:32:35.322344] E [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):121:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 578, in entry_ops
e['mode'], e['uid'], e['gid'])
File "/usr/libexec/glusterfs/python/syncdaemon/py2py3.py", line 176, in entry_pack_mkdir
stat.S_IMODE(mo), umask())
error: argument for 's' must be a string
[2022-09-14 13:32:35.346811] I [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):96:service_loop] RepceServer: terminating on reaching EOF.
[2022-09-14 13:32:46.491343] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:32:47.623702] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1321}]
[2022-09-14 13:32:47.624353] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:32:51.840122] E [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):121:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 578, in entry_ops
e['mode'], e['uid'], e['gid'])
File "/usr/libexec/glusterfs/python/syncdaemon/py2py3.py", line 176, in entry_pack_mkdir
stat.S_IMODE(mo), umask())
error: argument for 's' must be a string
[2022-09-14 13:32:51.856772] I [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):96:service_loop] RepceServer: terminating on reaching EOF.
[2022-09-14 13:33:03.70065] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:33:04.202480] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1322}]
[2022-09-14 13:33:04.203071] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:33:08.453227] E [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):121:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 578, in entry_ops
e['mode'], e['uid'], e['gid'])
File "/usr/libexec/glusterfs/python/syncdaemon/py2py3.py", line 176, in entry_pack_mkdir
stat.S_IMODE(mo), umask())
error: argument for 's' must be a string
what is the python version you are using?
The packaged version, from the centos SIG based repo, python3 on rocky8 and python2 on centos7 :
Master (Rocky8)
[CentOS-Gluster-9]
name=CentOS-$releasever - Gluster 9
baseurl=http://mirror.centos.org/centos/8-stream/storage/$basearch/gluster-9/
glusterfs.x86_64 9.6-1.el8s @CentOS-Gluster-9
glusterfs-cli.x86_64 9.6-1.el8s @CentOS-Gluster-9
glusterfs-client-xlators.x86_64 9.6-1.el8s @CentOS-Gluster-9
glusterfs-fuse.x86_64 9.6-1.el8s @CentOS-Gluster-9
glusterfs-geo-replication.x86_64 9.6-1.el8s @CentOS-Gluster-9
glusterfs-selinux.noarch 2.0.1-1.el8s @CentOS-Gluster-9
glusterfs-server.x86_64 9.6-1.el8s @CentOS-Gluster-9
libgfapi0.x86_64 9.6-1.el8s @CentOS-Gluster-9
libgfchangelog0.x86_64 9.6-1.el8s @CentOS-Gluster-9
libgfrpc0.x86_64 9.6-1.el8s @CentOS-Gluster-9
libgfxdr0.x86_64 9.6-1.el8s @CentOS-Gluster-9
libglusterd0.x86_64 9.6-1.el8s @CentOS-Gluster-9
libglusterfs0.x86_64 9.6-1.el8s @CentOS-Gluster-9
**python3-gluster.x86_64 9.6-1.el8s @CentOS-Gluster-9**
Slave (Centos 7)
name=CentOS-$releasever - Gluster 9
mirrorlist=http://mirrorlist.centos.org?arch=$basearch&release=$releasever&repo=storage-gluster-9
glusterfs.x86_64 9.6-1.el7 @centos-gluster9
glusterfs-cli.x86_64 9.6-1.el7 @centos-gluster9
glusterfs-client-xlators.x86_64 9.6-1.el7 @centos-gluster9
glusterfs-fuse.x86_64 9.6-1.el7 @centos-gluster9
glusterfs-geo-replication.x86_64 9.6-1.el7 @centos-gluster9
glusterfs-server.x86_64 9.6-1.el7 @centos-gluster9
libgfapi0.x86_64 9.6-1.el7 @centos-gluster9
libgfchangelog0.x86_64 9.6-1.el7 @centos-gluster9
libgfrpc0.x86_64 9.6-1.el7 @centos-gluster9
libgfxdr0.x86_64 9.6-1.el7 @centos-gluster9
libglusterd0.x86_64 9.6-1.el7 @centos-gluster9
libglusterfs0.x86_64 9.6-1.el7 @centos-gluster9
**python2-gluster.x86_64 9.6-1.el7 @centos-gluster9**
userspace-rcu.x86_64 0.10.0-3.el7 @centos-gluster9
userspace-rcu-devel.x86_64 0.10.0-3.el7 @centos-gluster9
I had wondered about the python version being different in the centos7 package as a possible reason for incompatibility so I decided, after your query, to experiment with building from source on centos7 against python3.
This appears to have solved the issue, I've put the build steps below for those interested who may also come across this issue. Note that its important to configure the build to match the prebuilt package locations otherwise it will not work (geo replication specifically won't work as the master expects the executable/lib locations to match on the slave) :
yum install autoconf automake bison cmockery2-devel dos2unix flex fuse-devel glib2-devel libacl-devel libaio-devel libattr-devel libcurl-devel libibverbs-devel librdmacm-devel libtirpc-devel libtool libxml2-devel lvm2-devel make openssl-devel pkgconfig pyliblzma python-devel python-eventlet python-netifaces python-paste-deploy python-simplejson python-sphinx python-webob pyxattr readline-devel rpm-build sqlite-devel systemtap-sdt-devel tar userspace-rcu-devel wget python36-devel
cd /root
mkdir gluster
cd gluster
wget https://download.gluster.org/pub/gluster/glusterfs/9/9.6/glusterfs-9.6.tar.gz
tar -zxvf glusterfs-9.6.tar.gz
cd glusterfs-9.6
PYTHON_CFLAGS=/usr/include/python3.6m/
PYTHON_LIBS=/usr/include/python3.6m
./autogen.sh
./configure --without-libtirpc --disable-linux-io_uring --prefix=/usr --exec-prefix=/usr --libdir=/usr/lib64 --localstatedir=/var --sysconfdir=/etc
make
make install
systemctl daemon-reload
systemctl start glusterd
systemctl enable glusterd
BR
Martin
Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.