glusterfs
glusterfs copied to clipboard
Transport endpoint not connected
Description of problem: Intermittently, seemingly without any obvious cause, gluster mounts lock and clients receive "Transport endpoint is not connected". This occurs from a brick itself.
The exact command to reproduce the issue: /etc/fstab
SRVDCK01:/replvol /mnt/datarepl glusterfs defaults,_netdev,x-systemd.automount,backup-volfile-servers=srvdck02.int.hope.mx:srvfle02.int.hope.mx 0 0
The full output of the command that failed: Any I/O to the mounted volume, e.g. ls.
Expected results: Normal I/O
Mandatory info:
- The output of the gluster volume info
command:
Volume Name: replvol
Type: Replicate
Volume ID: 0f51aa3d-df9e-4ae8-9a3a-9f993f403921
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: srvdck01:/mnt/gluster/brick1/vol1
Brick2: srvdck02:/mnt/gluster/brick1/vol1
Brick3: srvfle02:/mnt/gluster/brick1/vol1
Options Reconfigured:
locks.mandatory-locking: optimal
cluster.self-heal-daemon: enable
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
- The output of the gluster volume status
command:
Status of volume: replvol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick srvdck01:/mnt/gluster/brick1/vol1 59388 0 Y 2443
Brick srvdck02:/mnt/gluster/brick1/vol1 56101 0 Y 902
Brick srvfle02:/mnt/gluster/brick1/vol1 60520 0 Y 1524
Self-heal Daemon on localhost N/A N/A Y 2491
Self-heal Daemon on srvdck02.int.hope.mx N/A N/A Y 936
Self-heal Daemon on srvfle02.int.hope.mx N/A N/A Y 1558
Task Status of Volume replvol
------------------------------------------------------------------------------
There are no active volume tasks
- The output of the gluster volume heal
command:
root@SRVDCK01:/var/log# gluster volume heal replvol info
Brick srvdck01:/mnt/gluster/brick1/vol1
Status: Connected
Number of entries: 0
Brick srvdck02:/mnt/gluster/brick1/vol1
Status: Connected
Number of entries: 0
Brick srvfle02:/mnt/gluster/brick1/vol1
Status: Connected
Number of entries: 0
**- Provide logs present on following locations of client and server nodes -/var/log/glusterfs/
/glusterfs/mnt-datarepl.log
[2022-08-01 19:41:19.938596 +0000] C [rpc-clnt-ping.c:152:rpc_clnt_ping_timer_expired] 0-replvol-client-3: server 10.99.99.103:58463 has not responded in the last 42 seconds, disconnecting.
[2022-08-01 19:41:22.087831 +0000] I [socket.c:3801:socket_submit_outgoing_msg] 0-replvol-client-3: not connected (priv->connected = -1)
[2022-08-01 19:41:22.087907 +0000] W [rpc-clnt.c:1709:rpc_clnt_submit] 0-replvol-client-3: failed to submit rpc-request (unique: 8902413, XID: 0x47c8fe Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 35) to rpc-transport (replvol-client-3)
[2022-08-01 19:41:22.087949 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:968:client4_0_fgetxattr_cbk] 0-replvol-client-3: remote operation failed. [{errno=107}, {error=Transport endpoint is not connected}]
[2022-08-01 19:41:24.498911 +0000] W [rpc-clnt.c:1709:rpc_clnt_submit] 0-replvol-client-3: failed to submit rpc-request (unique: 8902448, XID: 0x47c8ff Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 14) to rpc-transport (replvol-client-3)
[2022-08-01 19:41:24.498998 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:613:client4_0_statfs_cbk] 0-replvol-client-3: remote operation failed. [{errno=107}, {error=Transport endpoint is not connected}]
[2022-08-01 19:41:25.939331 +0000] C [rpc-clnt-ping.c:152:rpc_clnt_ping_timer_expired] 0-replvol-client-1: server 127.0.1.1:54115 has not responded in the last 42 seconds, disconnecting.
[2022-08-01 19:41:26.246116 +0000] I [socket.c:3801:socket_submit_outgoing_msg] 0-replvol-client-1: not connected (priv->connected = -1)
...last 4 messages repeat...
/glusterfs/glusterd.log.1
[2022-08-01 19:47:05.900343 +0000] W [glusterfsd.c:1458:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fc71410dea7] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xc5) [0x55c3cd228e25] -->/usr/sbin/glusterd(cleanup_and_exit+0x57) [0x55c3cd221437] ) 0-: received signum (15), shutting down
[2022-08-01 19:50:04.754802 +0000] I [MSGID: 100030] [glusterfsd.c:2767:main] 0-/usr/sbin/glusterd: Started running version [{arg=/usr/sbin/glusterd}, {version=10.2}, {cmdlinestr=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO}]
[2022-08-01 19:50:04.759765 +0000] I [glusterfsd.c:2447:daemonize] 0-glusterfs: Pid of current running process is 489
[2022-08-01 19:50:04.845598 +0000] I [MSGID: 106478] [glusterd.c:1470:init] 0-management: Maximum allowed open file descriptors set to 65536
[2022-08-01 19:50:04.845707 +0000] I [MSGID: 106479] [glusterd.c:1560:init] 0-management: Using /var/lib/glusterd as working directory
[2022-08-01 19:50:04.845733 +0000] I [MSGID: 106479] [glusterd.c:1564:init] 0-management: Using /var/run/gluster as pid file working directory
/glusterfs/bricks/mnt-gluster-brick1-vol1.log.1
[2022-08-01 19:41:25.939605 +0000] W [socket.c:749:__socket_rwv] 0-tcp.replvol-server: readv on 127.0.0.1:49150 failed (No data available)
[2022-08-01 19:41:25.939722 +0000] I [MSGID: 115036] [server.c:500:server_rpc_notify] 0-replvol-server: disconnecting connection [{client-uid=CTX_ID:ab4fafee-614e-48c1-9b83-fd80c69696d5-GRAPH_ID:0-PID:1365-HOST:SRVDCK01-PC_NAME:replvol-client-1-RECON_NO:-0}]
[2022-08-01 19:41:25.944671 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-replvol-server: releasing lock on b262594d-5096-44e5-b531-b3d118e0473e held by {client=0x7f0790050c38, pid=4095 lk-owner=88fe0ae88f7f0000}
[2022-08-01 19:41:25.944741 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-replvol-server: releasing lock on 50835960-dd38-4012-9a24-f2a32607f5f1 held by {client=0x7f0790050c38, pid=3449912 lk-owner=18489cd48f7f0000}
[2022-08-01 19:41:25.944763 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-replvol-server: releasing lock on 4f03088d-e5f9-4a32-85a7-a4bfaa45062a held by {client=0x7f0790050c38, pid=6940 lk-owner=c80ebcd48f7f0000}
[2022-08-01 19:41:25.944783 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-replvol-server: releasing lock on 1c01d5df-3569-410e-8940-46bba8f47f89 held by {client=0x7f0790050c38, pid=211929 lk-owner=082e0fd48f7f0000}
[2022-08-01 19:41:25.944801 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-replvol-server: releasing lock on 5e693426-f334-4ca6-af3a-e2eb57e4e0e7 held by {client=0x7f0790050c38, pid=0 lk-owner=48df39d48f7f0000}
[2022-08-01 19:41:25.944820 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-replvol-server: releasing lock on a2b72bdb-f044-455b-9b61-21e5d750d515 held by {client=0x7f0790050c38, pid=0 lk-owner=884406d48f7f0000}
[2022-08-01 19:41:25.944837 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-replvol-server: releasing lock on 369458aa-27cc-459a-b248-b1e94b25480f held by {client=0x7f0790050c38, pid=0 lk-owner=989c8ed48f7f0000}
[2022-08-01 19:41:25.944855 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-replvol-server: releasing lock on 22f32d04-b60f-4816-8d59-0e62ebf4a868 held by {client=0x7f0790050c38, pid=8215 lk-owner=78fd48e48f7f0000}
[2022-08-01 19:41:25.944873 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-replvol-server: releasing lock on 702385aa-6067-4e70-8897-46dba8e21b88 held by {client=0x7f0790050c38, pid=8214 lk-owner=581183d48f7f0000}
[2022-08-01 19:41:25.945057 +0000] I [MSGID: 115013] [server-helpers.c:311:do_fd_cleanup] 0-replvol-server: fd cleanup [{path=/portainer_data/portainer.db}]
[2022-08-01 19:41:25.945735 +0000] I [MSGID: 115013] [server-helpers.c:311:do_fd_cleanup] 0-replvol-server: fd cleanup [{path=/tailscale_data/tailscaled.log1.txt}]
[2022-08-01 19:41:25.945812 +0000] I [MSGID: 115013] [server-helpers.c:311:do_fd_cleanup] 0-replvol-server: fd cleanup [{path=/tailscale_data/tailscaled.log2.txt}]
[2022-08-01 19:41:25.946045 +0000] I [MSGID: 115013] [server-helpers.c:311:do_fd_cleanup] 0-replvol-server: fd cleanup [{path=/bitwarden_data/db.sqlite3}]
The message "I [MSGID: 115013] [server-helpers.c:311:do_fd_cleanup] 0-replvol-server: fd cleanup [{path=/bitwarden_data/db.sqlite3}]" repeated 2 times between [2022-08-01 19:41:25.946045 +0000] and [2022-08-01 19:41:25.946146 +0000]
...etc..
**- Is there any crash ? Provide the backtrace and coredump No crash
Additional info:
- The operating system / glusterfs version:
root@SRVDCK01:/var/log# uname -a
Linux SRVDCK01 5.10.0-16-amd64 #1 SMP Debian 5.10.127-2 (2022-07-23) x86_64 GNU/Linux
root@SRVDCK01:/var/log# cat /etc/debian_version
11.4
root@SRVDCK01:/var/log# gluster --version
glusterfs 10.2
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration
This happens every few days - if there's additional detail I can collect the next time this happens please let me know.