Mount hangs when `cli` called with `asyncio.run_in_executor`
I have been experimenting with ratarmount and mostly working nicely so far, thanks
A problem i have hit and struggling to figure out why is that when i call cli(args) directly from my code mounting works as expected
When i shift the same code to run under asyncio.run_in_executor the mountpoint is created but the program hangs as if its running in the foreground
When i control-C my escape hatch unmounts successfully as expected
Im wondering if there is some tty detection or similar going on in the fuse lib, and whether there is a way to force return/backgrounding
Possibly related #76
sys info:
fuse/stable,now 2.9.9-5 amd64 [installed]
#1 SMP Debian 5.10.113-1 (2022-04-29) GNU/Linux
ratarmount==0.11.3
PYTHON_VERSION = "3.10.2"
there doesnt appear to be any difference in the os.environ in/outside the executor so it doesnt appear to be TTY detection
Care has to be taken because the daemonizing forks into the background and I have to join all threads, e.g., for the parallel bz2 decoder, before that fork and reopen it after it has forked. Aside from that, I'm out of ideas regarding problems when daemonizing :/
i think it may be this https://github.com/libfuse/libfuse/issues/382
altho im somewhat confused as im using a ProcessPoolExecutor which should theoretically banish thread problems - either way i think it has something to do with signal handling
i have tried stepping through/into code with pdb and it hangs on the call to:
err = _libfuse.fuse_main_real(
...
in fusepy/fuse.py
possibly related https://groups.google.com/g/comp.lang.python/c/tkS3VvyLD1M
another ~related discussion https://groups.google.com/g/python-tulip/c/91NCCqV4SFs
this seems to resolve, altho im not clear at all of the implications:
import multiprocessing
multiprocessing.set_start_method('forkserver')
actually i think it didnt work - it instead throws an error about A process in the process pool was terminated abruptly while the future was running or pending.
i can workaround by avoiding using run_in_executor and just calling the tool so not a huge blocker for us, and it speeds things up considerably compared to copying out tarballs everywhere - so, thanks again
would be good to know what is going on, still 8/
i can workaround by avoiding using
run_in_executorand just calling the tool so not a huge blocker for us, and it speeds things up considerably compared to copying out tarballs everywhere - so, thanks againwould be good to know what is going on, still 8/
:) Good to hear that it can be worked around.
Currently, I'm a bit low on time especially as I want to prioritize the custom-written parallel random access gzip backend pragzip but I might take a closer look at it in the future especially as there is still that MacOS issue that you referenced open. Furthermore, I still have on my to-do list to port ratarmount to FUSE3, which would require using something other than fusepy. Maybe that also alleviates or worsens the problem.
Maybe you could also post a minimal example so I can reproduce it easily for testing purposes?
Maybe you could also post a minimal example so I can reproduce it easily for testing purposes?
yep, i was thinking similar - ill follow up on this tomorrow
it may even isolate the problem and make it obvious - but im out of tricks atm - i think somehow the forked proc is having its signals managed in a way that doesnt play with fuse but havent managed to debug anything more
possibly related signals http://curiousthing.org/sigttin-sigttou-deep-dive-linux
ive seen a few mentions around mac and linux backgrounding issues <> SIGTTOU
its not clear whether this is the issue or how to resolve, but seems related
working on a minimal repro i realized a few things 8/
firstly it is sys.exiting if successful afaict - when i was thinking my script was successful it was just exiting after mounting the directory
not sure exactly how this is being triggered - i tried both catching SystemExit and adding atexit.register(fun) but neither catch the signal
the only time this doesnt happen is if there is failure, from which im deducing rightly/wrongly that something low-level in FUSE is bypassing the normal python lifecycle
anyhow, exit issues aside - the following code mounts in the background:
import pathlib
import tarfile
import tempfile
import ratarmount
def create_tarball(output):
with tempfile.TemporaryDirectory() as tmp:
pathlib.Path(tmp).joinpath("foo.txt").write_text("BAR")
with tarfile.open(output, "w:gz") as tar:
tar.add(tmp, arcname=".")
return output
def mount_dir(tarball):
ratarmount.cli((tarball, "tmpmount"))
tarball = create_tarball("baz.tar.gz")
mount_dir(tarball)
but the asyncio/executor version sits in the foreground without exiting:
import asyncio
import concurrent
async def mount_dir_async(tarball):
asyncio.get_event_loop().run_in_executor(
concurrent.futures.ProcessPoolExecutor(),
mount_dir,
tarball)
asyncio.run(mount_dir_async(tarball))
Btw, I'm not sure if it applies to your use case but you could also avoid forking into the background by specifying --foreground and then keep that process or thread in the background yourself. That way it would also be correctly closed on exit.
You could also use ratarmount as a library but that might require more adaption in your code.
Im wondering if there is some tty detection or similar going on in the fuse lib, and whether there is a way to force return/backgrounding
Note that your example code goes into background when using ThreadPoolExecutor instead of ProcessPoolExecutor.
The magic for daemonizing happens in this fork in libfuse.
Afaik, it starts a daemonized child process and then the actual process simply finishes and quits. However, this does not explain why it does not work in the ProcessPoolExecutor...
I'm not sure what exactly your intentions were but if you want to mount it and use the mount point in the same script, then the previously recommended library would be the way to go.
If you want to use FUSE because another function or library wants an existing file system path, then I don't see the problem with the current behavior. The only missing link would be that you call ratarmount -u before exiting so that your python program does not hang:
import asyncio
import concurrent
import os
import time
def mount_dir(tarball):
ratarmount.cli(("-f", tarball, "tmpmount"))
async def mount_dir_async(tarball):
asyncio.get_event_loop().run_in_executor(
concurrent.futures.ProcessPoolExecutor(),
mount_dir,
tarball)
async def do_something_with_the_mount_async():
# We need to wait at least until the mount point got created and initialized
# There might be a better way to do this...
time.sleep(2)
print("Contents of mount point:", os.listdir("tmpmount"))
async def runAsync():
mountTask = asyncio.create_task(mount_dir_async(tarball))
doTask = asyncio.create_task(do_something_with_the_mount_async())
await doTask
print("Unmounting now ...")
ratarmount.cli(("-u", "tmpmount"))
await mountTask
print("Finished")
asyncio.run(runAsync())
Output:
fusermount: entry for tmpmount not found in /etc/mtab
Creating new SQLite index database at baz.tar.gz.index.sqlite
Creating offset dictionary for baz.tar.gz ...
Creating offset dictionary for baz.tar.gz took 0.00s
Writing out TAR index to baz.tar.gz.index.sqlite took 0s and is sized 24576 B
Contents of mount point: ['foo.txt']
Unmounting now ...
Finished
This does work somewhat even though it is clunky. I'm no expert on async usage in Python... That's why at times I get a weird error when it tries to exit this script:
exception calling callback for <Future at 0x7fc0eb816410 state=finished returned NoneType>
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
callback(self)
File "/usr/lib/python3.10/asyncio/futures.py", line 399, in _call_set_state
dest_loop.call_soon_threadsafe(_set_state, destination, source)
File "/usr/lib/python3.10/asyncio/base_events.py", line 795, in call_soon_threadsafe
self._check_closed()
File "/usr/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
If you want your own program to go into background, then you still shouldn't depend on ratarmount on doing that because it probably is not possible in the first place. You could however, just like in the above example, start your own daemonized subprocess, which when starts ratarmount in a subsubprocess normally without daemonizing. Something like:
process = multiprocessing.Process(target=[runAsync](lambda: asyncio.run(runAsync())), daemon=True)
process.start()
I can't get it to work correctly right now.
Feel free to reopen this or another issue if you still have trouble but I don't see a fix in ratarmount for the daemonizing part and the "hang" should be fixed by calling fusermount -u or ratarmount -u.