glusterfs icon indicating copy to clipboard operation
glusterfs copied to clipboard

writing to fuse device yielded ENOENT

Open davidvoisonics opened this issue 2 years ago • 5 comments

Following on from issue #1741 and #3498 , we were experiencing very slow response time accessing files on a GlusterFS 9.6 system on Ubuntu 18.04 server. The server in question is both a GlusterFS node and client.

Listing directory contents via the FUSE mount typically took 2-10 seconds, whereas a different client was fast. In mnt-glusterfs.log we saw lots of warnings like this:

[2023-04-03 20:16:14.789588 +0000] W [fuse-bridge.c:310:check_and_dump_fuse_W] 0-glusterfs-fuse: writing to fuse device yielded ENOENT 256 times

After running "echo 3 > /proc/sys/vm/drop_caches" as suggested in issue #1471 the response improved dramatically, to around 0.009s, the same as the other client.

Can you please advise how we should tune GlusterFS to avoid this problem? I see mention of the --lru-limit and --invalidate-limit options in that issue, but to be honest don't understand how to use the warning messages to decide on a suitable value for those options. Thanks in advance.

Here are the GlusterFS details:

root@br:~# gluster volume info
 
Volume Name: gvol0
Type: Replicate
Volume ID: 2d2c1552-bc93-4c91-b8ca-73553f00fdcd
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: br:/nodirectwritedata/gluster/gvol0
Brick2: sg:/nodirectwritedata/gluster/gvol0
Options Reconfigured:
cluster.min-free-disk: 20%
network.ping-timeout: 10
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
storage.health-check-interval: 0
cluster.server-quorum-ratio: 50
root@br:~# 
root@br:~# gluster volume status
Status of volume: gvol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick br:/nodirectwritedata/gluster/gvol0   49152     0          Y       4761 
Brick sg:/nodirectwritedata/gluster/gvol0   49152     0          Y       2329 
Self-heal Daemon on localhost               N/A       N/A        Y       5304 
Self-heal Daemon on sg                      N/A       N/A        Y       2629 
 
Task Status of Volume gvol0
------------------------------------------------------------------------------
There are no active volume tasks
 
root@br:~# 
root@br:~# gluster volume heal gvol0 info summary
Brick br:/nodirectwritedata/gluster/gvol0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick sg:/nodirectwritedata/gluster/gvol0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

davidvoisonics avatar Apr 03 '23 23:04 davidvoisonics

How long does it typically take to have someone look into this please?

davidvoisonics avatar Apr 12 '23 04:04 davidvoisonics

i have the same problem here; do you have any solutions? @davidvoisonics sorry for bumping but thanks in advance

eliphatfs avatar Mar 09 '24 11:03 eliphatfs

I set --invalidate-limit to 32 or 64 but it doesn't help...

eliphatfs avatar Mar 10 '24 20:03 eliphatfs

Hello,

Could you update this issue please ? It' s quite impossible to use glusterfs in production even in version 10 because the administration is awfull, logs and issues doesnt help because you dont respond to it ! I have setup a glusterfs cluster (2 replicas and 1 arbiter two years ago, update every time but still spend a lot of time to administrate, debug, check logs and issues ...)

Like this type of errors : writing to fuse device yielded ENOENT 256 times I check old issues with gluster version 4 and the only advice you gave is to update to latest version ... So in latest version, there is still the problem !!

Maybe because red hat let you alone, you have no time to maintain the glusterfs project like you said ... Ceph will be the solution for use I think

bdoublet91 avatar Mar 17 '24 10:03 bdoublet91

This is not a lot of fun. IBM crashed another party.

TheWitness avatar Jun 10 '24 19:06 TheWitness

Does anyone have steps to reproduce this issue locally?

pranithk avatar Jan 20 '25 11:01 pranithk