glusterfs Cannot do geo-replication or brick addition between Glusterfs (v9.6) servers where one server is centos7 and the other is rocky8

Description of problem:

Using glusterfs9 on centos7 attempting to add brick to cluster or geo replicate with glusterfs9 on rocky8 (centos 8/rhel8) fails reliably (centos7 to centos7 or rocky8 to rocky8 works fine)

The exact command to reproduce the issue:

on geo rep node (slave) - centos7

[root@test-geo brick]# glusterd --version
glusterfs 9.6
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root@test-geo brick]# mkdir /root/brick
[root@test-geo brick]# gluster volume create test 10.6.2.53:/root/brick force
volume create: test: success: please start the volume to access data
root@test-geo brick]# gluster volume start test
volume start: test: success
[root@test-geo brick]# gluster volume status
Status of volume: test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.6.2.53:/root/brick                 49152     0          Y       25690

Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks

on master node : (rocky8)

[root@staging-glus9-1 ~]# gluster volume geo-replication staging-gluster-store 10.6.2.53::test create push-pem force
Creating geo-replication session between staging-gluster-store & 10.6.2.53::test has been successful
[root@staging-glus9-1 ~]# gluster volume geo-replication staging-gluster-store 10.6.2.53::test start
Starting geo-replication session between staging-gluster-store & 10.6.2.53::test has been successful
[root@staging-glus9-1 ~]# gluster volume geo-replication status

MASTER NODE    MASTER VOL               MASTER BRICK                         SLAVE USER    SLAVE                    SLAVE NODE    STATUS     CRAWL STATUS    LAST_SYNCED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.6.2.225     staging-gluster-store    /mnt/staging-gluster1-store/brick    root          ssh://10.6.2.53::test                  Passive    N/A             N/A
10.6.2.228     staging-gluster-store    /mnt/staging-gluster2-store/brick    root          ssh://10.6.2.53::test    N/A           Faulty     N/A             N/A

The full output of the command that failed:

Fails to initialize / transfer any data with repeated errors in log (mostly dict related issues) on rocky8 host (master) :

[2022-08-26 13:59:12.590336 +0000] I [MSGID: 106316] [glusterd-geo-rep.c:3312:glusterd_op_stage_gsync_create] 0-mana
gement: 10.6.2.53::test is not a valid slave volume. Error: Total disk size of master is greater than disk size of s
lave.
Total available size of master is greater than available size of slave. Force creating geo-rep session.
[2022-08-26 13:59:12.590430 +0000] W [MSGID: 106028] [glusterd-geo-rep.c:2728:glusterd_get_statefile_name] 0-managem
ent: Config file (/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf) missing. Looki
ng for template config file (/var/lib/glusterd/geo-replication/gsyncd_template.conf) [No such file or directory]
[2022-08-26 13:59:12.590449 +0000] I [MSGID: 106294] [glusterd-geo-rep.c:2738:glusterd_get_statefile_name] 0-managem
ent: Using default config template(/var/lib/glusterd/geo-replication/gsyncd_template.conf).
[2022-08-26 13:59:15.924574 +0000] I [run.c:242:runner_log] (-->/usr/lib64/glusterfs/9.6/xlator/mgmt/glusterd.so(+0x
eb736) [0x7f0f41c33736] -->/usr/lib64/glusterfs/9.6/xlator/mgmt/glusterd.so(+0xeb0b6) [0x7f0f41c330b6] -->/lib64/lib
glusterfs.so.0(runner_log+0x115) [0x7f0f562b8925] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/gsync-create
/post/S56glusterd-geo-rep-create-post.sh --volname=staging-gluster-store is_push_pem=1,pub_file=/var/lib/glusterd/ge
o-replication/common_secret.pem.pub,slave_user=root,slave_ip=10.6.2.53,slave_vol=test,ssh_port=22
[2022-08-26 13:59:20.610811 +0000] I [MSGID: 106327] [glusterd-geo-rep.c:2722:glusterd_get_statefile_name] 0-managem
ent: Using passed config template(/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf
).
[2022-08-26 13:59:21.285258 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-gluster
d: Dict get failed [{Key=gsync-status}, {errno=9}, {error=Bad file descriptor}]
[2022-08-26 13:59:21.472660 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-gluster
d: Dict get failed [{Key=gsync-status}, {errno=61}, {error=No data available}]
[2022-08-26 13:59:23.089805 +0000] I [MSGID: 106496] [glusterd-handshake.c:969:__server_getspec] 0-management: Recei
ved mount request for volume staging-gluster-store
[2022-08-26 13:59:26.202468 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:332:__glusterd_handle_gsync_set] 0-manageme
nt: master not found, while handling geo-replication options
[2022-08-26 13:59:26.202493 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:339:__glusterd_handle_gsync_set] 0-manageme
nt: slave not found, while handling geo-replication options
[2022-08-26 13:59:26.202528 +0000] W [MSGID: 106061] [glusterd-geo-rep.c:2413:glusterd_op_gsync_args_get] 0-manageme
nt: master not found
[2022-08-26 13:59:26.202541 +0000] E [MSGID: 106061] [glusterd-server-quorum.c:93:glusterd_validate_quorum] 0-manage
ment: Dict get failed [{Key=volname}]
[2022-08-26 13:59:26.310960 +0000] I [MSGID: 106327] [glusterd-geo-rep.c:4691:glusterd_read_status_file] 0-managemen
t: Using passed config template(/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf).

[2022-08-26 13:59:26.482522 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-gluster
d: Dict get failed [{Key=gsync-status}]
[2022-08-26 13:59:26.767654 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-gluster
d: Dict get failed [{Key=gsync-status}]
[2022-08-26 13:59:29.025620 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:332:__glusterd_handle_gsync_set] 0-manageme
nt: master not found, while handling geo-replication options
[2022-08-26 13:59:29.025662 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:339:__glusterd_handle_gsync_set] 0-manageme
nt: slave not found, while handling geo-replication options
[2022-08-26 13:59:29.025695 +0000] W [MSGID: 106061] [glusterd-geo-rep.c:2413:glusterd_op_gsync_args_get] 0-manageme
nt: master not found
[2022-08-26 13:59:29.025709 +0000] E [MSGID: 106061] [glusterd-server-quorum.c:93:glusterd_validate_quorum] 0-manage
ment: Dict get failed [{Key=volname}]
[2022-08-26 13:59:29.136745 +0000] I [MSGID: 106327] [glusterd-geo-rep.c:4691:glusterd_read_status_file] 0-management: Using passed config template(/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf).
[2022-08-26 13:59:29.307944 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-glusterd: Dict get failed [{Key=gsync-status}]
[2022-08-26 13:59:29.597878 +0000] E [MSGID: 106061] [glusterd-utils.c:10202:glusterd_append_gsync_status] 0-glusterd: Dict get failed [{Key=gsync-status}]
[2022-08-26 13:59:31.935850 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:332:__glusterd_handle_gsync_set] 0-management: master not found, while handling geo-replication options
[2022-08-26 13:59:31.935879 +0000] I [MSGID: 106061] [glusterd-geo-rep.c:339:__glusterd_handle_gsync_set] 0-management: slave not found, while handling geo-replication options
[2022-08-26 13:59:31.935913 +0000] W [MSGID: 106061] [glusterd-geo-rep.c:2413:glusterd_op_gsync_args_get] 0-management: master not found
[2022-08-26 13:59:31.935925 +0000] E [MSGID: 106061] [glusterd-server-quorum.c:93:glusterd_validate_quorum] 0-management: Dict get failed [{Key=volname}]
[2022-08-26 13:59:32.043509 +0000] I [MSGID: 106327] [glusterd-geo-rep.c:4691:glusterd_read_status_file] 0-management: Using passed config template(/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf).

Expected results:

The geo rep should start off and transfer data across (works fine rocky8 to rocky8 and centos7 to centos7 on this version)

Mandatory info: - The output of the gluster volume info command: master :

[root@staging-glus9-1 ~]# gluster volume info

Volume Name: staging-gluster-archive Type: Replicate Volume ID: 7022b18f-9393-4dab-8df9-c985d6d54ef3 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.6.2.228:/mnt/staging-gluster2-archive/brick Brick2: 10.6.2.225:/mnt/staging-gluster1-archive/brick Options Reconfigured: features.read-only: off network.ping-timeout: 5 transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off

Volume Name: staging-gluster-store Type: Replicate Volume ID: 52d36274-ddf2-4af7-a633-829e9607924f Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.6.2.228:/mnt/staging-gluster2-store/brick Brick2: 10.6.2.225:/mnt/staging-gluster1-store/brick Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on features.read-only: off network.ping-timeout: 5 transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off

[root@staging-glus9-1 ~]# gluster volume status Status of volume: staging-gluster-archive Gluster process TCP Port RDMA Port Online Pid

Brick 10.6.2.228:/mnt/staging-gluster2-arch ive/brick 49152 0 Y 3402 Brick 10.6.2.225:/mnt/staging-gluster1-arch ive/brick 49152 0 Y 1211 Self-heal Daemon on localhost N/A N/A Y 1523 Self-heal Daemon on staging-glus9-2 N/A N/A Y 3757

Task Status of Volume staging-gluster-archive

There are no active volume tasks

Status of volume: staging-gluster-store Gluster process TCP Port RDMA Port Online Pid

Brick 10.6.2.228:/mnt/staging-gluster2-stor e/brick 49153 0 Y 3599 Brick 10.6.2.225:/mnt/staging-gluster1-stor e/brick 49153 0 Y 1380 Self-heal Daemon on localhost N/A N/A Y 1523 Self-heal Daemon on staging-glus9-2 N/A N/A Y 3757

Task Status of Volume staging-gluster-store

There are no active volume tasks

geo rep slave :

[root@test-geo brick]# gluster volume info

Volume Name: test Type: Distribute Volume ID: 63857233-05b9-4428-9265-e767dc06a1bb Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.6.2.53:/root/brick Options Reconfigured: features.read-only: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on

[root@test-geo brick]# gluster volume status Status of volume: test Gluster process TCP Port RDMA Port Online Pid

Brick 10.6.2.53:/root/brick 49152 0 Y 25690

Task Status of Volume test

There are no active volume tasks

- The output of the gluster volume heal command:

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/

**- Is there any crash ? Provide the backtrace and coredump

Additional info:

- The operating system / glusterfs version: master : Rocky Linux release 8.6 (Green Obsidian) / Linux staging-glus9-1 4.18.0-372.19.1.el8_6.x86_64 #1 SMP Tue Aug 2 16:19:42 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux / glusterfs 9.6

slave / geo rep target : CentOS Linux release 7.9.2009 (Core) / Linux test-geo 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 16:57:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux / glusterfs 9.6

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

Aug 26 '22 14:08 marnixgb

Can you check if GlusterFS version is the same on Rocky Linux (I think it should be version 10.2). If that's the case, I dont know if it's related, but Gluster 10 introduced some changes on the output of some commands related to geo-replication and maybe one node is not able to parse the output produced by other version. Try to run the command "gluster volume geo-replication status --xml" on your different working clusters (Centos7 -> CentOS7 and RockyLinux 8 -> RockyLinux 8) and tell us if the name of all XML elements are the same.

Aug 28 '22 10:08 yasalos

Both nodes are running gluster 9.6 (one on rocky8 and the other on centos7) - sorry if I didn't make that clear

This is the thing that causes me the most concern - I really woudn't expect interoperability issues on the same version (irrespective of platform)

Output of gluster volume geo-replication status --xml on master node :

gluster volume geo-replication status --xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
  <opRet>0</opRet>
  <opErrno>0</opErrno>
  <opErrstr>master not found</opErrstr>
  <geoRep>
    <volume>
      <name>staging-gluster-store</name>
      <sessions>
        <session>
          <session_slave>77cf1069-a4ff-44ed-ac82-44cad2054c62:ssh://10.6.2.53::test:63857233-05b9-4428-9265-e767dc06a1bb</session_slave>
          <pair>
            <master_node>10.6.2.225</master_node>
            <master_brick>/mnt/staging-gluster1-store/brick</master_brick>
            <slave_user>root</slave_user>
            <slave>ssh://10.6.2.53::test</slave>
            <slave_node>N/A</slave_node>
            <status>Faulty</status>
            <crawl_status>N/A</crawl_status>
            <entry>N/A</entry>
            <data>N/A</data>
            <meta>N/A</meta>
            <failures>N/A</failures>
            <checkpoint_completed>N/A</checkpoint_completed>
            <master_node_uuid>77cf1069-a4ff-44ed-ac82-44cad2054c62</master_node_uuid>
            <last_synced>N/A</last_synced>
            <checkpoint_time>N/A</checkpoint_time>
            <checkpoint_completion_time>N/A</checkpoint_completion_time>
          </pair>
          <pair>
            <master_node>10.6.2.228</master_node>
            <master_brick>/mnt/staging-gluster2-store/brick</master_brick>
            <slave_user>root</slave_user>
            <slave>ssh://10.6.2.53::test</slave>
            <slave_node>N/A</slave_node>
            <status>Faulty</status>
            <crawl_status>N/A</crawl_status>
            <entry>N/A</entry>
            <data>N/A</data>
            <meta>N/A</meta>
            <failures>N/A</failures>
            <checkpoint_completed>N/A</checkpoint_completed>
            <master_node_uuid>a83a7f51-f93c-4af2-9353-aff16cbb615c</master_node_uuid>
            <last_synced>N/A</last_synced>
            <checkpoint_time>N/A</checkpoint_time>
            <checkpoint_completion_time>N/A</checkpoint_completion_time>
          </pair>
        </session>
      </sessions>
    </volume>
  </geoRep>
</cliOutput>

The above cannot be run on the slave node so not sure how useful it is, but the target node (slave) volume status is :

gluster volume status --xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
  <opRet>0</opRet>
  <opErrno>0</opErrno>
  <opErrstr/>
  <volStatus>
    <volumes>
      <volume>
        <volName>test</volName>
        <nodeCount>1</nodeCount>
        <node>
          <hostname>10.6.2.53</hostname>
          <path>/root/brick</path>
          <peerid>6c24cb21-c1d2-43ea-917c-63b02e1afe8a</peerid>
          <status>1</status>
          <port>49152</port>
          <ports>
            <tcp>49152</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>25690</pid>
        </node>
        <tasks/>
      </volume>
    </volumes>
  </volStatus>
</cliOutput>

Aug 28 '22 17:08 marnixgb

Any chance of an update on this please. Geo replication is an important part of our DR plan. If I can't progress this I may need to revert to glusterfs version 7

Sep 02 '22 08:09 marnixgb

Please check /var/log/glusterfs/geo-replication/<primary_ip_secondary>/gsyncd.log on both primary and secondary node, it might give you some clue.

Have you tried add brick on a running geo-rep session?

Sep 08 '22 12:09 Shwetha-Acharya

Thanks for your attention on this, apologies for the delay in responding

I deleted the geo replication session and started again with it - these were the resulting logs following a push-pem and start of the new geo replication session

Master log (Rocky8) :

[2022-09-14 13:29:13.646251] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}]
[2022-09-14 13:29:13.646447] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/mnt/staging-gluster1-store/brick}, {slave_node=10.6.2.53}]
[2022-09-14 13:29:13.725821] I [resource(worker /mnt/staging-gluster1-store/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave...
[2022-09-14 13:30:13.659664] I [monitor(monitor):241:monitor] Monitor: Worker not confirmed after wait, aborting it. Gsyncd invocation on remote slave via SSH or gluster master mount might have hung. Please check the above logs for exact issue and check master or slave volume for errors. Restarting master/slave volume accordingly might help. [{brick=/mnt/staging-gluster1-store/brick}, {timeout=60}]
[2022-09-14 13:30:13.664363] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}]
[2022-09-14 13:30:23.677638] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}]
[2022-09-14 13:30:23.677780] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/mnt/staging-gluster1-store/brick}, {slave_node=10.6.2.53}]
[2022-09-14 13:30:23.752400] I [resource(worker /mnt/staging-gluster1-store/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave...
[2022-09-14 13:30:25.169637] I [resource(worker /mnt/staging-gluster1-store/brick):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.4170}]
[2022-09-14 13:30:25.169842] I [resource(worker /mnt/staging-gluster1-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:30:26.203054] I [resource(worker /mnt/staging-gluster1-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.0331}]
[2022-09-14 13:30:26.203222] I [subcmds(worker /mnt/staging-gluster1-store/brick):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2022-09-14 13:30:28.215208] I [master(worker /mnt/staging-gluster1-store/brick):1662:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/staging-gluster-store_10.6.2.53_test/mnt-staging-gluster1-store-brick}]
[2022-09-14 13:30:28.215479] I [resource(worker /mnt/staging-gluster1-store/brick):1292:service_loop] GLUSTER: Register time [{time=1663162228}]
[2022-09-14 13:30:28.227694] I [gsyncdstatus(worker /mnt/staging-gluster1-store/brick):287:set_passive] GeorepStatus: Worker Status Change [{status=Passive}]
[2022-09-14 13:30:31.592967] E [syncdutils(worker /mnt/staging-gluster1-store/brick):325:log_raise_exception] <top>: connection to peer is broken
[2022-09-14 13:30:31.755902] I [subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor Status Change [{status=Stopped}]
[2022-09-14 13:30:41.88936] I [subcmds(delete):173:subcmd_delete] <top>: geo-replication delete
[2022-09-14 13:30:50.604513] W [gsyncd(config-get):299:main] <top>: Session config file not exists, using the default config [{path=/var/lib/glusterd/geo-replication/staging-gluster-store_10.6.2.53_test/gsyncd.conf}]
[2022-09-14 13:30:52.534513] I [subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor Status Change [{status=Created}]
[2022-09-14 13:32:07.743417] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}]
[2022-09-14 13:32:07.743619] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/mnt/staging-gluster1-store/brick}, {slave_node=10.6.2.53}]
[2022-09-14 13:32:07.815019] I [resource(worker /mnt/staging-gluster1-store/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave...
[2022-09-14 13:32:09.219334] I [resource(worker /mnt/staging-gluster1-store/brick):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.4041}]
[2022-09-14 13:32:09.219643] I [resource(worker /mnt/staging-gluster1-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:32:10.249689] I [resource(worker /mnt/staging-gluster1-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.0299}]
[2022-09-14 13:32:10.249983] I [subcmds(worker /mnt/staging-gluster1-store/brick):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2022-09-14 13:32:12.259141] I [master(worker /mnt/staging-gluster1-store/brick):1662:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/staging-gluster-store_10.6.2.53_test/mnt-staging-gluster1-store-brick}]
[2022-09-14 13:32:12.259487] I [resource(worker /mnt/staging-gluster1-store/brick):1292:service_loop] GLUSTER: Register time [{time=1663162332}]
[2022-09-14 13:32:12.268468] I [gsyncdstatus(worker /mnt/staging-gluster1-store/brick):287:set_passive] GeorepStatus: Worker Status Change [{status=Passive}]

Slave Log (Centos7) :

[2022-09-14 13:30:17.987849] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:30:19.126813] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1387}]
[2022-09-14 13:30:19.127486] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:30:19.127985] I [repce(slave 10.6.2.225/mnt/staging-gluster1-store/brick):96:service_loop] RepceServer: terminating on reaching EOF.
[2022-09-14 13:30:24.29012] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:30:25.164525] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1353}]
[2022-09-14 13:30:25.165018] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:30:31.598059] I [repce(slave 10.6.2.225/mnt/staging-gluster1-store/brick):96:service_loop] RepceServer: terminating on reaching EOF.
[2022-09-14 13:32:08.79457] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:32:08.375463] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:32:09.214279] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1347}]
[2022-09-14 13:32:09.214703] I [resource(slave 10.6.2.225/mnt/staging-gluster1-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:32:09.506212] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1306}]
[2022-09-14 13:32:09.506655] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:32:18.750761] E [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):121:worker] <top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 578, in entry_ops
    e['mode'], e['uid'], e['gid'])
  File "/usr/libexec/glusterfs/python/syncdaemon/py2py3.py", line 176, in entry_pack_mkdir
    stat.S_IMODE(mo), umask())
error: argument for 's' must be a string
[2022-09-14 13:32:18.776757] I [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):96:service_loop] RepceServer: terminating on reaching EOF.
[2022-09-14 13:32:29.949736] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:32:31.83671] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1338}]
[2022-09-14 13:32:31.84393] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:32:35.322344] E [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):121:worker] <top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 578, in entry_ops
    e['mode'], e['uid'], e['gid'])
  File "/usr/libexec/glusterfs/python/syncdaemon/py2py3.py", line 176, in entry_pack_mkdir
    stat.S_IMODE(mo), umask())
error: argument for 's' must be a string
[2022-09-14 13:32:35.346811] I [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):96:service_loop] RepceServer: terminating on reaching EOF.
[2022-09-14 13:32:46.491343] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:32:47.623702] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1321}]
[2022-09-14 13:32:47.624353] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:32:51.840122] E [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):121:worker] <top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 578, in entry_ops
    e['mode'], e['uid'], e['gid'])
  File "/usr/libexec/glusterfs/python/syncdaemon/py2py3.py", line 176, in entry_pack_mkdir
    stat.S_IMODE(mo), umask())
error: argument for 's' must be a string
[2022-09-14 13:32:51.856772] I [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):96:service_loop] RepceServer: terminating on reaching EOF.
[2022-09-14 13:33:03.70065] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2022-09-14 13:33:04.202480] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1322}]
[2022-09-14 13:33:04.203071] I [resource(slave 10.6.2.228/mnt/staging-gluster2-store/brick):1166:service_loop] GLUSTER: slave listening
[2022-09-14 13:33:08.453227] E [repce(slave 10.6.2.228/mnt/staging-gluster2-store/brick):121:worker] <top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 578, in entry_ops
    e['mode'], e['uid'], e['gid'])
  File "/usr/libexec/glusterfs/python/syncdaemon/py2py3.py", line 176, in entry_pack_mkdir
    stat.S_IMODE(mo), umask())
error: argument for 's' must be a string

Sep 14 '22 13:09 marnixgb

what is the python version you are using?

Sep 15 '22 06:09 Shwetha-Acharya

The packaged version, from the centos SIG based repo, python3 on rocky8 and python2 on centos7 :

Master (Rocky8)

[CentOS-Gluster-9]
name=CentOS-$releasever - Gluster 9
baseurl=http://mirror.centos.org/centos/8-stream/storage/$basearch/gluster-9/

glusterfs.x86_64                                                  9.6-1.el8s                                                @CentOS-Gluster-9
glusterfs-cli.x86_64                                              9.6-1.el8s                                                @CentOS-Gluster-9
glusterfs-client-xlators.x86_64                                   9.6-1.el8s                                                @CentOS-Gluster-9
glusterfs-fuse.x86_64                                             9.6-1.el8s                                                @CentOS-Gluster-9
glusterfs-geo-replication.x86_64                                  9.6-1.el8s                                                @CentOS-Gluster-9
glusterfs-selinux.noarch                                          2.0.1-1.el8s                                              @CentOS-Gluster-9
glusterfs-server.x86_64                                           9.6-1.el8s                                                @CentOS-Gluster-9
libgfapi0.x86_64                                                  9.6-1.el8s                                                @CentOS-Gluster-9
libgfchangelog0.x86_64                                            9.6-1.el8s                                                @CentOS-Gluster-9
libgfrpc0.x86_64                                                  9.6-1.el8s                                                @CentOS-Gluster-9
libgfxdr0.x86_64                                                  9.6-1.el8s                                                @CentOS-Gluster-9
libglusterd0.x86_64                                               9.6-1.el8s                                                @CentOS-Gluster-9
libglusterfs0.x86_64                                              9.6-1.el8s                                                @CentOS-Gluster-9
**python3-gluster.x86_64                                            9.6-1.el8s                                                @CentOS-Gluster-9**

Slave (Centos 7)

name=CentOS-$releasever - Gluster 9
mirrorlist=http://mirrorlist.centos.org?arch=$basearch&release=$releasever&repo=storage-gluster-9

glusterfs.x86_64                         9.6-1.el7                     @centos-gluster9
glusterfs-cli.x86_64                     9.6-1.el7                     @centos-gluster9
glusterfs-client-xlators.x86_64          9.6-1.el7                     @centos-gluster9
glusterfs-fuse.x86_64                    9.6-1.el7                     @centos-gluster9
glusterfs-geo-replication.x86_64         9.6-1.el7                     @centos-gluster9
glusterfs-server.x86_64                  9.6-1.el7                     @centos-gluster9
libgfapi0.x86_64                         9.6-1.el7                     @centos-gluster9
libgfchangelog0.x86_64                   9.6-1.el7                     @centos-gluster9
libgfrpc0.x86_64                         9.6-1.el7                     @centos-gluster9
libgfxdr0.x86_64                         9.6-1.el7                     @centos-gluster9
libglusterd0.x86_64                      9.6-1.el7                     @centos-gluster9
libglusterfs0.x86_64                     9.6-1.el7                     @centos-gluster9
**python2-gluster.x86_64                   9.6-1.el7                     @centos-gluster9**
userspace-rcu.x86_64                     0.10.0-3.el7                  @centos-gluster9
userspace-rcu-devel.x86_64               0.10.0-3.el7                  @centos-gluster9

Sep 15 '22 07:09 marnixgb

I had wondered about the python version being different in the centos7 package as a possible reason for incompatibility so I decided, after your query, to experiment with building from source on centos7 against python3.

This appears to have solved the issue, I've put the build steps below for those interested who may also come across this issue. Note that its important to configure the build to match the prebuilt package locations otherwise it will not work (geo replication specifically won't work as the master expects the executable/lib locations to match on the slave) :

yum install autoconf automake bison cmockery2-devel dos2unix flex     fuse-devel glib2-devel libacl-devel libaio-devel libattr-devel      libcurl-devel libibverbs-devel librdmacm-devel libtirpc-devel       libtool libxml2-devel lvm2-devel make openssl-devel pkgconfig       pyliblzma python-devel python-eventlet python-netifaces             python-paste-deploy python-simplejson python-sphinx python-webob    pyxattr readline-devel rpm-build sqlite-devel systemtap-sdt-devel   tar userspace-rcu-devel wget python36-devel
cd /root
mkdir gluster
cd gluster
wget https://download.gluster.org/pub/gluster/glusterfs/9/9.6/glusterfs-9.6.tar.gz
tar -zxvf glusterfs-9.6.tar.gz
cd glusterfs-9.6
PYTHON_CFLAGS=/usr/include/python3.6m/
PYTHON_LIBS=/usr/include/python3.6m
./autogen.sh
./configure --without-libtirpc --disable-linux-io_uring --prefix=/usr --exec-prefix=/usr --libdir=/usr/lib64 --localstatedir=/var --sysconfdir=/etc
make
make install

systemctl daemon-reload
systemctl start glusterd
systemctl enable glusterd

BR

Martin

Sep 16 '22 09:09 marnixgb

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

May 21 '23 16:05 stale[bot]

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

Jun 11 '23 09:06 stale[bot]

glusterfs glusterfs copied to clipboard

Cannot do geo-replication or brick addition between Glusterfs (v9.6) servers where one server is centos7 and the other is rocky8

[root@staging-glus9-1 ~]# gluster volume status Status of volume: staging-gluster-archive Gluster process TCP Port RDMA Port Online Pid

Task Status of Volume staging-gluster-archive

Status of volume: staging-gluster-store Gluster process TCP Port RDMA Port Online Pid

Task Status of Volume staging-gluster-store

[root@test-geo brick]# gluster volume status Status of volume: test Gluster process TCP Port RDMA Port Online Pid

Task Status of Volume test

glusterfs
glusterfs copied to clipboard