bindfs icon indicating copy to clipboard operation
bindfs copied to clipboard

[Request] Timeout handling.

Open kokoko3k opened this issue 4 years ago • 4 comments

First, i'd like to thank you for bindfs. Probably my usecase for it is a bit different from the intended one, because i use it to mirror problematic network shares like smb and nfs ones, that lives in kernel space. Since bindfs lives in userspace, i can gracefully recover from situations when the share does not answer. I can just kill the bindfs proces, umount it and remout so that the application does not hangs forever. I made a script and a service that takes care of that, see here: https://bbs.archlinux.org/viewtopic.php?pid=1898333#p1898333

If you have time, could you consider to add a timeout parameter to bindfs mount options that restarts the bindfs mount or just kill it when the "mirrored" path does not answer?

Thank you again!

kokoko3k avatar Apr 15 '20 09:04 kokoko3k

Hi and sorry for taking a while to reply.

Ideally the network filesystems would have their own timeouts, but if they commonly don't, then this seems like a reasonable feature. Unfortunately it's not a trivial feature and I have less free time than I used to, so I can't make promises on when and if I'll do this. Pull requests are welcome.

There are a few ways this could be implemented:

(1) There could be a watchdog thread that gets activated for the duration of each operation. It'd kill and restart bindfs if an operation takes too long. This would be pretty much equivalent to your script. (2) Bindfs could forward I/O operations to a subprocess and kill and restart it if it times out.

I'm leaning towards (2). While (1) might seem simpler and have less perf overhead, I'm not sure to what extent it's a source of bugs and confusion for users that the mount point can temporarily look empty. (2) would also allow for automatic retries.

(I'll also note that if a network FS really hangs forever, then I'd highly suspect that killing the caller won't stop the hang from effectively leaking kernel memory and/or a file descriptor and/or a zombie PID. Again it'd be better to implement a timeout in the network FS.)

mpartel avatar Apr 25 '20 08:04 mpartel

Hello, I can understand that a network filesystem can hang forever in because it tries hard to not interrupt the workflow, and it can even be desiderable with an unreliable network, but as far as i could see, neither nfs, nor cifs provide a way to report an error instead of trying and trying. To be honest, i did not even tried to request such feature. I think that if they never implemented it they would not do now, maybe is a "restriction" defined in the protocol itself, dunno.

It is true that seeing a mounted share as empty could scare the user, but the timeout behaviour could just be disabled by default, so that the user who activates it knows what is going on.

I really don't know how to write C code, sorry, thanks a lot for considering my proposal.

kokoko3k avatar Apr 25 '20 09:04 kokoko3k

Looking at https://manpages.debian.org/testing/cifs-utils/mount.cifs.8.en.html I see options like

handletimeout
soft
noresilienthandles
nopersistenthandles
echo_interval

some of which are on by default.

These (especially 'soft') look like they should cause timeouts instead of hangs. So if those are all set appropriately and CIFS still hangs, it seems like a bug in CIFS.

Unless the server (or server cluster?) has an internal issue but stays alive enough to respond to the client's echo request.

mpartel avatar Apr 25 '20 11:04 mpartel

"soft" is on by default and it never worked for me in years, i doubt it is intended to make the client to timeout, maybe when they write "not hang", they mean that you can sent SIGINT to the app and it exits without the need for a SIGKILL (?) Also, echo_interval is set to 60 seconds, but it does not trig any timeout either. Other options i can see all refers to how the server should behave when the client reconnects, but my problem comes before it :)

kokoko3k avatar Apr 25 '20 11:04 kokoko3k