glusterfs icon indicating copy to clipboard operation
glusterfs copied to clipboard

Glusterfs (fuse) client crashes

Open MarvinTO opened this issue 4 years ago • 16 comments
trafficstars

Description of problem: We have a glusterfs volume used to store some grafana data. There is a sqlite database stored in the volume.

After upgrading the fuse driver to 8.5, the client mount started to crash constantly (using an older version of the fuse driver is working correctly 7.7). I was not able to find any errors on the servers side at the time of the client crash.

The exact command to reproduce the issue: The mount is being handled by an entry in /etc/fstab gfs.prod.com:/grafana /opt/grafana glusterfs defaults 0 0

The full output of the command that failed:

[2021-08-19 16:48:38.770606] I [MSGID: 100030] [glusterfsd.c:2689:main] 0-/usr/sbin/glusterfs: Started running version [{arg=/usr/sbin/glusterfs}, {version=8.5}, {cmdlinestr=/usr/sbin/glusterfs --process-name fuse --volfile-server=gfs.prod.com --volfile-id=/grafana /opt/grafana}]
[2021-08-19 16:48:38.772337] I [glusterfsd.c:2424:daemonize] 0-glusterfs: Pid of current running process is 28076
[2021-08-19 16:48:38.777452] I [socket.c:4252:ssl_setup_connection_params] 0-glusterfs: SSL support for MGMT is ENABLED IO path is ENABLED certificate depth is 3 for peer
[2021-08-19 16:48:38.834561] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}]
[2021-08-19 16:48:38.834620] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=1}]
[2021-08-19 16:48:38.847727] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: gfs009.prod.com:24007 gfs008.prod.com:24007
[2021-08-19 16:48:38.853408] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=2}]
[2021-08-19 16:48:38.853508] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=3}]
[2021-08-19 16:48:38.855828] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-0: parent translators are ready, attempting connect on transport []
[2021-08-19 16:48:38.860215] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-1: parent translators are ready, attempting connect on transport []
[2021-08-19 16:48:38.864363] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-2: parent translators are ready, attempting connect on transport []
Final graph:
+------------------------------------------------------------------------------+
  1: volume grafana-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host gfs009.prod.com
  5:     option remote-subvolume /opt/data/grafana
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option transport.socket.ssl-enabled off
  9:     option event-threads 4
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14:     option strict-locks off
 15:     option send-gids true
 16: end-volume
 17:
 18: volume grafana-client-1
 19:     type protocol/client
 20:     option ping-timeout 42
 21:     option remote-host gfs007.prod.com
 22:     option remote-subvolume /opt/data/grafana
 23:     option transport-type socket
 24:     option transport.address-family inet
 25:     option transport.socket.ssl-enabled off
 26:     option event-threads 4
 27:     option transport.tcp-user-timeout 0
 28:     option transport.socket.keepalive-time 20
 29:     option transport.socket.keepalive-interval 2
 30:     option transport.socket.keepalive-count 9
 31:     option strict-locks off
 32:     option send-gids true
 33: end-volume
 34:
 35: volume grafana-client-2
 36:     type protocol/client
 37:     option ping-timeout 42
 38:     option remote-host gfs008.prod.com
 39:     option remote-subvolume /opt/data/grafana
 40:     option transport-type socket
 41:     option transport.address-family inet
 42:     option transport.socket.ssl-enabled off
 43:     option event-threads 4
 44:     option transport.tcp-user-timeout 0
 45:     option transport.socket.keepalive-time 20
 46:     option transport.socket.keepalive-interval 2
 47:     option transport.socket.keepalive-count 9
 48:     option strict-locks off
 49:     option send-gids true
 50: end-volume
 51:
 52: volume grafana-replicate-0
 53:     type cluster/replicate
 54:     option afr-pending-xattr grafana-client-0,grafana-client-1,grafana-client-2
 55:     option favorite-child-policy majority
 56:     option use-compound-fops off
 57:     subvolumes grafana-client-0 grafana-client-1 grafana-client-2
 58: end-volume
 59:
 60: volume grafana-dht
 61:     type cluster/distribute
 62:     option lookup-optimize on
 63:     option readdir-optimize on
 64:     option lock-migration off
 65:     option force-migration off
 66:     subvolumes grafana-replicate-0
 67: end-volume
 68:
 69: volume grafana-utime
 70:     type features/utime
 71:     option noatime on
 72:     subvolumes grafana-dht
 73: end-volume
 74:
 75: volume grafana-write-behind
 76:     type performance/write-behind
 77:     option flush-behind on
 78:     option cache-size 16MB
 79:     subvolumes grafana-utime
 80: end-volume
 81:
 82: volume grafana-readdir-ahead
 83:     type performance/readdir-ahead
 84:     option parallel-readdir off
 85:     option rda-request-size 131072
 86:     option rda-cache-limit 10MB
 87:     subvolumes grafana-write-behind
 88: end-volume
 89:
 90: volume grafana-io-cache
 91:     type performance/io-cache
 92:     option cache-size 2GB
 93:     subvolumes grafana-readdir-ahead
 94: end-volume
 95:
 96: volume grafana-open-behind
 97:     type performance/open-behind
 98:     subvolumes grafana-io-cache
 99: end-volume
100:
101: volume grafana-quick-read
102:     type performance/quick-read
103:     option cache-size 2GB
104:     subvolumes grafana-open-behind
105: end-volume
106:
107: volume grafana-md-cache
108:     type performance/md-cache
109:     subvolumes grafana-quick-read
110: end-volume
111:
112: volume grafana
113:     type debug/io-stats
114:     option log-level INFO
115:     option threads 16
116:     option latency-measurement off
117:     option count-fop-hits off
118:     option global-threading off
119:     subvolumes grafana-md-cache
120: end-volume
121:
122: volume meta-autoload
123:     type meta
124:     subvolumes grafana
125: end-volume
126:
+------------------------------------------------------------------------------+
[2021-08-19 16:48:38.871921] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-0: changing port to 49152 (from 0)
[2021-08-19 16:48:38.872086] I [socket.c:849:__socket_shutdown] 0-grafana-client-0: intentional socket shutdown(13)
[2021-08-19 16:48:38.877519] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-1: changing port to 49152 (from 0)
[2021-08-19 16:48:38.877570] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-0: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}]
[2021-08-19 16:48:38.877786] I [socket.c:849:__socket_shutdown] 0-grafana-client-1: intentional socket shutdown(14)
[2021-08-19 16:48:38.880255] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-0: Connected, attached to remote volume [{conn-name=grafana-client-0}, {remote_subvol=/opt/data/grafana}]
[2021-08-19 16:48:38.880334] I [MSGID: 108005] [afr-common.c:5993:__afr_handle_child_up_event] 0-grafana-replicate-0: Subvolume 'grafana-client-0' came back up; going online.
[2021-08-19 16:48:38.883553] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-1: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}]
[2021-08-19 16:48:38.885425] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-1: Connected, attached to remote volume [{conn-name=grafana-client-1}, {remote_subvol=/opt/data/grafana}]
[2021-08-19 16:48:38.885520] I [MSGID: 108002] [afr-common.c:6363:afr_notify] 0-grafana-replicate-0: Client-quorum is met
[2021-08-19 16:48:38.886059] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-2: changing port to 49152 (from 0)
[2021-08-19 16:48:38.886218] I [socket.c:849:__socket_shutdown] 0-grafana-client-2: intentional socket shutdown(15)
[2021-08-19 16:48:38.891974] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-2: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}]
[2021-08-19 16:48:38.901301] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-2: Connected, attached to remote volume [{conn-name=grafana-client-2}, {remote_subvol=/opt/data/grafana}]
[2021-08-19 16:48:38.904360] I [fuse-bridge.c:5300:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.33
[2021-08-19 16:48:38.904408] I [fuse-bridge.c:5928:fuse_graph_sync] 0-fuse: switched to graph 0
[2021-08-19 16:51:04.399387] W [fuse-bridge.c:1387:fuse_attr_cbk] 0-glusterfs-fuse: 7681: STAT() /grafana.db-journal => -1 (Stale file handle)
[2021-08-19 16:51:04.400295] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfiddb027943-9da2-4215-a366-e604a72e3010}]
[2021-08-19 16:51:08.328621] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfid720826ed-36bb-4e88-9bdb-0dc86157c882}]
[2021-08-19 16:52:30.905560] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-grafana-client-0: remote operation failed. [{path=/grafana.db-journal}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=61}, {error=No data available}]
[2021-08-19 16:52:31.006928] W [fuse-bridge.c:1387:fuse_attr_cbk] 0-glusterfs-fuse: 16064: STAT() /grafana.db-journal => -1 (Stale file handle)
[2021-08-19 16:52:33.328302] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfideab087b3-0ca3-4064-829f-30cb29796e8d}]
[2021-08-19 16:56:06.337099] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidf913d531-837b-4f12-972a-cc7f2a9833f3}]
[2021-08-19 16:59:13.429520] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-grafana-client-0: remote operation failed. [{path=/grafana.db-journal}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=61}, {error=No data available}]
[2021-08-19 16:59:15.333468] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfid3530b62f-26bd-4328-ba52-82eeaf5e2838}]
[2021-08-19 16:59:20.328369] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidcae336ae-ee34-4d89-92b1-821a52564895}]
[2021-08-19 17:03:26.333398] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidba24faf1-d7c6-4f19-8b71-3d3eecd589d5}]
[2021-08-19 17:21:09.787909] E [MSGID: 126004] [open-behind.c:587:ob_writev] 0-grafana-open-behind: Failed to submit fop [{fop=writev}, {errno=116}, {error=Stale file handle}]
[2021-08-19 17:21:09.788000] W [fuse-bridge.c:3049:fuse_writev_cbk] 0-glusterfs-fuse: 84876: WRITE => -1 gfid=628bfe1d-e406-4bf5-a31d-4bcc61693e09 fd=0x7f07100d4248 (Stale file handle)
[2021-08-19 17:21:09.788168] E [MSGID: 126004] [open-behind.c:618:ob_flush] 0-grafana-open-behind: Failed to submit fop [{fop=flush}, {errno=116}, {error=Stale file handle}]
[2021-08-19 17:21:09.788204] W [fuse-bridge.c:1952:fuse_err_cbk] 0-glusterfs-fuse: 84877: FLUSH() ERR => -1 (Stale file handle)
[2021-08-19 17:21:09.792144] W [fuse-bridge.c:2036:fuse_unlink_cbk] 0-glusterfs-fuse: 84878: UNLINK() /grafana.db-journal => -1 (No such file or directory)
pending frames:
frame : type(1) op(LK)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2021-08-19 17:21:09
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 8.5
/lib64/libglusterfs.so.0(+0x27e3f)[0x7f073b57ce3f]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f073b587ce4]
/lib64/libc.so.6(+0x36400)[0x7f0739bc6400]
/usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x3b41b)[0x7f072c7e041b]
/usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x3c790)[0x7f072c7e1790]
/usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x587fd)[0x7f072c7fd7fd]
/lib64/libgfrpc.so.0(+0xf101)[0x7f073b328101]
/lib64/libgfrpc.so.0(+0xf467)[0x7f073b328467]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f073b324a93]
/usr/lib64/glusterfs/8.5/rpc-transport/socket.so(+0x4278)[0x7f072f6f4278]
/usr/lib64/glusterfs/8.5/rpc-transport/socket.so(+0x9a91)[0x7f072f6f9a91]
/lib64/libglusterfs.so.0(+0x8c91c)[0x7f073b5e191c]
/lib64/libpthread.so.0(+0x7ea5)[0x7f073a3c8ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f0739c8e9fd]
---------

Expected results: Fuse mount to be up & running

Mandatory info: - The output of the gluster volume info command: Volume Name: grafana Type: Replicate Volume ID: 004549c4-5299-471c-b2f4-339f3437f844 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gfs009.prod.com:/opt/data/grafana Brick2: gfs007.prod.com:/opt/data/grafana Brick3: gfs008.prod.com:/opt/data/grafana Options Reconfigured: server.event-threads: 4 performance.write-behind-window-size: 16MB performance.readdir-ahead: enable performance.read-ahead: disable performance.io-thread-count: 32 performance.io-cache: on performance.flush-behind: on performance.client-io-threads: off performance.cache-size: 2GB cluster.lookup-optimize: on client.event-threads: 4 cluster.readdir-optimize: on transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on cluster.favorite-child-policy: majority

- The output of the gluster volume status command: Status of volume: grafana Gluster process TCP Port RDMA Port Online Pid

Brick gfs009.prod.com:/opt/data/grafana 49152 0 Y 9013 Brick gfs007.prod.com:/opt/data/grafana 49152 0 Y 8888 Brick gfs008.prod.com:/opt/data/grafana 49152 0 Y 8958 Self-heal Daemon on localhost N/A N/A Y 8896 Self-heal Daemon on gfs009.prod.com N/A N/A Y 9024 Self-heal Daemon on gfs008.prod.com N/A N/A Y 8969

Task Status of Volume grafana

There are no active volume tasks

- The output of the gluster volume heal command: Brick gfs009.prod.com:/opt/data/grafana Status: Connected Number of entries: 0

Brick gfs007.prod.com:/opt/data/grafana Status: Connected Number of entries: 0

Brick gfs008.prod.com:/opt/data/grafana Status: Connected Number of entries: 0

Additional info:

- The operating system / glusterfs version: centos79 / glusterfs 8.5

MarvinTO avatar Aug 19 '21 18:08 MarvinTO

Looks like https://github.com/gluster/glusterfs/issues/1966 ?

On Thu, 19 Aug 2021, 21:17 MarvinTO @.***> wrote:

Description of problem: We have a glusterfs volume used to store some grafana data. There is a sqlite database stored in the volume.

After upgrading the fuse driver to 8.5, the client mount started to crash constantly (using an older version of the fuse driver is working correctly 7.7). I was not able to find any errors on the servers side at the time of the client crash.

The exact command to reproduce the issue: The mount is being handled by an entry in /etc/fstab gfs.prod.com:/grafana /opt/grafana glusterfs defaults 0 0

The full output of the command that failed:

[2021-08-19 16:48:38.770606] I [MSGID: 100030] [glusterfsd.c:2689:main] 0-/usr/sbin/glusterfs: Started running version [{arg=/usr/sbin/glusterfs}, {version=8.5}, {cmdlinestr=/usr/sbin/glusterfs --process-name fuse --volfile-server=gfs.prod.com --volfile-id=/grafana /opt/grafana}] [2021-08-19 16:48:38.772337] I [glusterfsd.c:2424:daemonize] 0-glusterfs: Pid of current running process is 28076 [2021-08-19 16:48:38.777452] I [socket.c:4252:ssl_setup_connection_params] 0-glusterfs: SSL support for MGMT is ENABLED IO path is ENABLED certificate depth is 3 for peer [2021-08-19 16:48:38.834561] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}] [2021-08-19 16:48:38.834620] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=1}] [2021-08-19 16:48:38.847727] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: gfs009.prod.com:24007 gfs008.prod.com:24007 [2021-08-19 16:48:38.853408] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=2}] [2021-08-19 16:48:38.853508] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=3}] [2021-08-19 16:48:38.855828] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-0: parent translators are ready, attempting connect on transport [] [2021-08-19 16:48:38.860215] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-1: parent translators are ready, attempting connect on transport [] [2021-08-19 16:48:38.864363] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-2: parent translators are ready, attempting connect on transport [] Final graph: +------------------------------------------------------------------------------+ 1: volume grafana-client-0 2: type protocol/client 3: option ping-timeout 42 4: option remote-host gfs009.prod.com 5: option remote-subvolume /opt/data/grafana 6: option transport-type socket 7: option transport.address-family inet 8: option transport.socket.ssl-enabled off 9: option event-threads 4 10: option transport.tcp-user-timeout 0 11: option transport.socket.keepalive-time 20 12: option transport.socket.keepalive-interval 2 13: option transport.socket.keepalive-count 9 14: option strict-locks off 15: option send-gids true 16: end-volume 17: 18: volume grafana-client-1 19: type protocol/client 20: option ping-timeout 42 21: option remote-host gfs007.prod.com 22: option remote-subvolume /opt/data/grafana 23: option transport-type socket 24: option transport.address-family inet 25: option transport.socket.ssl-enabled off 26: option event-threads 4 27: option transport.tcp-user-timeout 0 28: option transport.socket.keepalive-time 20 29: option transport.socket.keepalive-interval 2 30: option transport.socket.keepalive-count 9 31: option strict-locks off 32: option send-gids true 33: end-volume 34: 35: volume grafana-client-2 36: type protocol/client 37: option ping-timeout 42 38: option remote-host gfs008.prod.com 39: option remote-subvolume /opt/data/grafana 40: option transport-type socket 41: option transport.address-family inet 42: option transport.socket.ssl-enabled off 43: option event-threads 4 44: option transport.tcp-user-timeout 0 45: option transport.socket.keepalive-time 20 46: option transport.socket.keepalive-interval 2 47: option transport.socket.keepalive-count 9 48: option strict-locks off 49: option send-gids true 50: end-volume 51: 52: volume grafana-replicate-0 53: type cluster/replicate 54: option afr-pending-xattr grafana-client-0,grafana-client-1,grafana-client-2 55: option favorite-child-policy majority 56: option use-compound-fops off 57: subvolumes grafana-client-0 grafana-client-1 grafana-client-2 58: end-volume 59: 60: volume grafana-dht 61: type cluster/distribute 62: option lookup-optimize on 63: option readdir-optimize on 64: option lock-migration off 65: option force-migration off 66: subvolumes grafana-replicate-0 67: end-volume 68: 69: volume grafana-utime 70: type features/utime 71: option noatime on 72: subvolumes grafana-dht 73: end-volume 74: 75: volume grafana-write-behind 76: type performance/write-behind 77: option flush-behind on 78: option cache-size 16MB 79: subvolumes grafana-utime 80: end-volume 81: 82: volume grafana-readdir-ahead 83: type performance/readdir-ahead 84: option parallel-readdir off 85: option rda-request-size 131072 86: option rda-cache-limit 10MB 87: subvolumes grafana-write-behind 88: end-volume 89: 90: volume grafana-io-cache 91: type performance/io-cache 92: option cache-size 2GB 93: subvolumes grafana-readdir-ahead 94: end-volume 95: 96: volume grafana-open-behind 97: type performance/open-behind 98: subvolumes grafana-io-cache 99: end-volume 100: 101: volume grafana-quick-read 102: type performance/quick-read 103: option cache-size 2GB 104: subvolumes grafana-open-behind 105: end-volume 106: 107: volume grafana-md-cache 108: type performance/md-cache 109: subvolumes grafana-quick-read 110: end-volume 111: 112: volume grafana 113: type debug/io-stats 114: option log-level INFO 115: option threads 16 116: option latency-measurement off 117: option count-fop-hits off 118: option global-threading off 119: subvolumes grafana-md-cache 120: end-volume 121: 122: volume meta-autoload 123: type meta 124: subvolumes grafana 125: end-volume 126: +------------------------------------------------------------------------------+ [2021-08-19 16:48:38.871921] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-0: changing port to 49152 (from 0) [2021-08-19 16:48:38.872086] I [socket.c:849:__socket_shutdown] 0-grafana-client-0: intentional socket shutdown(13) [2021-08-19 16:48:38.877519] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-1: changing port to 49152 (from 0) [2021-08-19 16:48:38.877570] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-0: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] [2021-08-19 16:48:38.877786] I [socket.c:849:__socket_shutdown] 0-grafana-client-1: intentional socket shutdown(14) [2021-08-19 16:48:38.880255] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-0: Connected, attached to remote volume [{conn-name=grafana-client-0}, {remote_subvol=/opt/data/grafana}] [2021-08-19 16:48:38.880334] I [MSGID: 108005] [afr-common.c:5993:__afr_handle_child_up_event] 0-grafana-replicate-0: Subvolume 'grafana-client-0' came back up; going online. [2021-08-19 16:48:38.883553] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-1: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] [2021-08-19 16:48:38.885425] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-1: Connected, attached to remote volume [{conn-name=grafana-client-1}, {remote_subvol=/opt/data/grafana}] [2021-08-19 16:48:38.885520] I [MSGID: 108002] [afr-common.c:6363:afr_notify] 0-grafana-replicate-0: Client-quorum is met [2021-08-19 16:48:38.886059] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-2: changing port to 49152 (from 0) [2021-08-19 16:48:38.886218] I [socket.c:849:__socket_shutdown] 0-grafana-client-2: intentional socket shutdown(15) [2021-08-19 16:48:38.891974] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-2: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] [2021-08-19 16:48:38.901301] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-2: Connected, attached to remote volume [{conn-name=grafana-client-2}, {remote_subvol=/opt/data/grafana}] [2021-08-19 16:48:38.904360] I [fuse-bridge.c:5300:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.33 [2021-08-19 16:48:38.904408] I [fuse-bridge.c:5928:fuse_graph_sync] 0-fuse: switched to graph 0 [2021-08-19 16:51:04.399387] W [fuse-bridge.c:1387:fuse_attr_cbk] 0-glusterfs-fuse: 7681: STAT() /grafana.db-journal => -1 (Stale file handle) [2021-08-19 16:51:04.400295] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfiddb027943-9da2-4215-a366-e604a72e3010}] [2021-08-19 16:51:08.328621] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfid720826ed-36bb-4e88-9bdb-0dc86157c882}] [2021-08-19 16:52:30.905560] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-grafana-client-0: remote operation failed. [{path=/grafana.db-journal}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=61}, {error=No data available}] [2021-08-19 16:52:31.006928] W [fuse-bridge.c:1387:fuse_attr_cbk] 0-glusterfs-fuse: 16064: STAT() /grafana.db-journal => -1 (Stale file handle) [2021-08-19 16:52:33.328302] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfideab087b3-0ca3-4064-829f-30cb29796e8d}] [2021-08-19 16:56:06.337099] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidf913d531-837b-4f12-972a-cc7f2a9833f3}] [2021-08-19 16:59:13.429520] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-grafana-client-0: remote operation failed. [{path=/grafana.db-journal}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=61}, {error=No data available}] [2021-08-19 16:59:15.333468] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfid3530b62f-26bd-4328-ba52-82eeaf5e2838}] [2021-08-19 16:59:20.328369] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidcae336ae-ee34-4d89-92b1-821a52564895}] [2021-08-19 17:03:26.333398] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidba24faf1-d7c6-4f19-8b71-3d3eecd589d5}] [2021-08-19 17:21:09.787909] E [MSGID: 126004] [open-behind.c:587:ob_writev] 0-grafana-open-behind: Failed to submit fop [{fop=writev}, {errno=116}, {error=Stale file handle}] [2021-08-19 17:21:09.788000] W [fuse-bridge.c:3049:fuse_writev_cbk] 0-glusterfs-fuse: 84876: WRITE => -1 gfid=628bfe1d-e406-4bf5-a31d-4bcc61693e09 fd=0x7f07100d4248 (Stale file handle) [2021-08-19 17:21:09.788168] E [MSGID: 126004] [open-behind.c:618:ob_flush] 0-grafana-open-behind: Failed to submit fop [{fop=flush}, {errno=116}, {error=Stale file handle}] [2021-08-19 17:21:09.788204] W [fuse-bridge.c:1952:fuse_err_cbk] 0-glusterfs-fuse: 84877: FLUSH() ERR => -1 (Stale file handle) [2021-08-19 17:21:09.792144] W [fuse-bridge.c:2036:fuse_unlink_cbk] 0-glusterfs-fuse: 84878: UNLINK() /grafana.db-journal => -1 (No such file or directory) pending frames: frame : type(1) op(LK) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2021-08-19 17:21:09 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 8.5 /lib64/libglusterfs.so.0(+0x27e3f)[0x7f073b57ce3f] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f073b587ce4] /lib64/libc.so.6(+0x36400)[0x7f0739bc6400] /usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x3b41b)[0x7f072c7e041b] /usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x3c790)[0x7f072c7e1790] /usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x587fd)[0x7f072c7fd7fd] /lib64/libgfrpc.so.0(+0xf101)[0x7f073b328101] /lib64/libgfrpc.so.0(+0xf467)[0x7f073b328467] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f073b324a93] /usr/lib64/glusterfs/8.5/rpc-transport/socket.so(+0x4278)[0x7f072f6f4278] /usr/lib64/glusterfs/8.5/rpc-transport/socket.so(+0x9a91)[0x7f072f6f9a91] /lib64/libglusterfs.so.0(+0x8c91c)[0x7f073b5e191c] /lib64/libpthread.so.0(+0x7ea5)[0x7f073a3c8ea5] /lib64/libc.so.6(clone+0x6d)[0x7f0739c8e9fd]

Expected results: Fuse mount to be up & running

Mandatory info: - The output of the gluster volume info command: Volume Name: grafana Type: Replicate Volume ID: 004549c4-5299-471c-b2f4-339f3437f844 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gfs009.prod.com:/opt/data/grafana Brick2: gfs007.prod.com:/opt/data/grafana Brick3: gfs008.prod.com:/opt/data/grafana Options Reconfigured: server.event-threads: 4 performance.write-behind-window-size: 16MB performance.readdir-ahead: enable performance.read-ahead: disable performance.io-thread-count: 32 performance.io-cache: on performance.flush-behind: on performance.client-io-threads: off performance.cache-size: 2GB cluster.lookup-optimize: on client.event-threads: 4 cluster.readdir-optimize: on transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on cluster.favorite-child-policy: majority - The output of the gluster volume status command: Status of volume: grafana Gluster process TCP Port RDMA Port Online Pid

Brick gfs009.prod.com:/opt/data/grafana 49152 0 Y 9013 Brick gfs007.prod.com:/opt/data/grafana 49152 0 Y 8888 Brick gfs008.prod.com:/opt/data/grafana 49152 0 Y 8958 Self-heal Daemon on localhost N/A N/A Y 8896 Self-heal Daemon on gfs009.prod.com N/A N/A Y 9024 Self-heal Daemon on gfs008.prod.com N/A N/A Y 8969 Task Status of Volume grafana

There are no active volume tasks

- The output of the gluster volume heal command: Brick gfs009.prod.com:/opt/data/grafana Status: Connected Number of entries: 0

Brick gfs007.prod.com:/opt/data/grafana Status: Connected Number of entries: 0

Brick gfs008.prod.com:/opt/data/grafana Status: Connected Number of entries: 0

Additional info:

- The operating system / glusterfs version: centos79 / glusterfs 8.5

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gluster/glusterfs/issues/2722, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDQT2PBXD5IUAJ6IEDEAILT5VDDFANCNFSM5COYE2XQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

mykaul avatar Aug 19 '21 20:08 mykaul

Description of problem: We have a glusterfs volume used to store some grafana data. There is a sqlite database stored in the volume.

After upgrading the fuse driver to 8.5, the client mount started to crash constantly (using an older version of the fuse driver is working correctly 7.7). I was not able to find any errors on the servers side at the time of the client crash.

The exact command to reproduce the issue: The mount is being handled by an entry in /etc/fstab gfs.prod.com:/grafana /opt/grafana glusterfs defaults 0 0

- The operating system / glusterfs version: centos79 / glusterfs 8.5

Could you post output of thread apply all bt of the core file?

pranithk avatar Aug 20 '21 03:08 pranithk

Hello @pranithk , I was trying to create the backtrace file, but I got a "Missing separate debuginfos" message. Do I need to install a specific package in order to get this working?

` Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-8.5-1.el7.x86_64

debuginfo-install glusterfs-fuse-8.5-1.el7.x86_64 Loaded plugins: fastestmirror, remove-with-leaves Loading mirror speeds from cached hostfile Could not find debuginfo for main pkg: glusterfs-fuse-8.5-1.el7.x86_64 Could not find debuginfo pkg for dependency package glibc-2.17-324.el7_9.x86_64 Could not find debuginfo pkg for dependency package 1:openssl-libs-1.0.2k-21.el7_9.x86_64 Could not find debuginfo pkg for dependency package libgfrpc0-8.5-1.el7.x86_64 Could not find debuginfo pkg for dependency package libgfxdr0-8.5-1.el7.x86_64 Could not find debuginfo pkg for dependency package libglusterfs0-8.5-1.el7.x86_64 Could not find debuginfo pkg for dependency package libuuid-2.23.2-65.el7_9.1.x86_64 Could not find debuginfo pkg for dependency package zlib-1.2.7-19.el7_9.x86_64 No debuginfo packages available to install `

MarvinTO avatar Aug 20 '21 16:08 MarvinTO

@tshacked I see that the release notes is done by you for 8.5, could you help find the rpms?

@MarvinTO It looks similar to https://github.com/gluster/glusterfs/issues/2443 which is fixed in 8.6 release.

pranithk avatar Aug 21 '21 13:08 pranithk

@tshacked I see that the release notes is done by you for 8.5, could you help find the rpms?

@MarvinTO It looks similar to #2443 which is fixed in 8.6 release.

Need the backtrace to confirm

pranithk avatar Aug 21 '21 13:08 pranithk

@tshacked I see that the release notes is done by you for 8.5, could you help find the rpms?

I can see that glusterfs-fuse-debuginfo-8.5-1 was generated for el8: https://cbs.centos.org/koji/buildinfo?buildID=32879 But not for el7 (it is also missing in prev el7 vers): https://cbs.centos.org/koji/buildinfo?buildID=32885 I didn't generate the actual build.
@kalebskeithley, @Shwetha-Acharya, Looks that debuginfo RPMs are not created on centos-7 builds. could you please check it?

tshacked avatar Aug 22 '21 08:08 tshacked

@tshacked I see that the release notes is done by you for 8.5, could you help find the rpms?

I can see that glusterfs-fuse-debuginfo-8.5-1 was generated for el8: https://cbs.centos.org/koji/buildinfo?buildID=32879 But not for el7 (it is also missing in prev el7 vers): https://cbs.centos.org/koji/buildinfo?buildID=32885 I didn't generate the actual build. @kalebskeithley, @Shwetha-Acharya, Looks that debuginfo RPMs are not created on centos-7 builds. could you please check it? @nixpanic

Shwetha-Acharya avatar Aug 23 '21 06:08 Shwetha-Acharya

The packages are there? Search for glusterfs-debuginfo-8.5-1.el7 on the page, el7 does not split the debuginfo per sub-package like el8 does.

nixpanic avatar Aug 23 '21 08:08 nixpanic

@nixpanic I found glusterfs-debuginfo-8.5-1.el7 in http://debuginfo.centos.org/centos/7/storage/x86_64/

Now I'm wondering if I need glusterfs-debuginfo-8.5-1.el7 or glusterfs-fuse-debuginfo-8.5-1.el7 in order to generate the backtrace.

I could not find glusterfs-fuse-debuginfo-8.5-1.el7 it in the repo and also could not find it in https://cbs.centos.org/koji/buildinfo?buildID=32885

MarvinTO avatar Aug 25 '21 22:08 MarvinTO

On Wed, Aug 25, 2021 at 03:58:55PM -0700, MarvinTO wrote:

@nixpanic I found glusterfs-debuginfo-8.5-1.el7 in http://debuginfo.centos.org/centos/7/storage/x86_64/

Now I'm wondering if I need glusterfs-debuginfo-8.5-1.el7 or glusterfs-fuse-debuginfo-8.5-1.el7 in order to generate the backtrace.

I could not find glusterfs-fuse-debuginfo-8.5-1.el7 it in the repo and also could not find it in https://cbs.centos.org/koji/buildinfo?buildID=32885

el7 generates a single debuginfo package per source RPM, glusterfs-debuginfo should contain everything.

el8 and newer generates a debuginfo package per sub-package, so there you can find glusterfs-fuse-debuginfo.

The CentOS Storage SIG repositories should support installing required debuginfo packages like this:

# debuginfo-install /usr/bin/glusterfs-fuse

There should not be a need to go searching for the RPMs (and its dependencies) yourself.

nixpanic avatar Aug 26 '21 07:08 nixpanic

@MarvinTO did you get a resolution to this issue? I'm experiencing issues with sqlite crashing the fuse mount on gluster 9.5

webash avatar Mar 29 '22 00:03 webash

sorry for the late response, no I didn't get a resolution to this issue. However, it was because I didn't follow up this thread.

The clients for this volume decided to replace this volume for a Google Regional Disk, so I was not able to reproduce this issue again (since the environment was changed).

MarvinTO avatar Apr 07 '22 21:04 MarvinTO

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] avatar Nov 12 '22 05:11 stale[bot]

Not stale. Don't close.

webash avatar Nov 12 '22 10:11 webash

Same problem with glusterfs-client 9.2-1 (arm64 running on raspberry pi 4).

I put a vaultwarden data directory onto my gv1 volume which gets mounted in a docker container. The data directory contains a db.sqlite3 file. While the gv1 volume was running very stable before, I am experiencing frequent crashes recently. The volume is shared between multiple docker containers, none of them is usable afterwards.

Output from systemd on client-side:

● retry-glusterfs-vol1-mount.service - Retry glusterfs mount
     Loaded: loaded (/etc/systemd/system/retry-glusterfs-vol1-mount.service; enabled; vendor preset: enabled)
     Active: active (exited) since Thu 2023-01-19 01:24:08 CET; 18h ago
    Process: 1690 ExecStart=/usr/bin/mount -t glusterfs pi2.home:/gv1 /srv/glusterfs/vol1 (code=exited, status=0/S>
   Main PID: 1690 (code=exited, status=0/SUCCESS)
      Tasks: 0 (limit: 4164)
        CPU: 36min 56.656s
     CGroup: /system.slice/retry-glusterfs-vol1-mount.service

Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: dlfcn 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: libpthread 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: llistxattr 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: setfsid 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: spinlock 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: epoll.h 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: xattr.h 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: st_atim.tv_nsec 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: package-string: glusterfs 9.2
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: ---------

bambigoreng avatar Jan 19 '23 19:01 bambigoreng

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] avatar Sep 17 '23 06:09 stale[bot]