uvloop icon indicating copy to clipboard operation
uvloop copied to clipboard

Blocking loop.subprocess_shell() and loop.subprocess_exec()

Open soonum opened this issue 6 years ago • 20 comments

  • uvloop version: 0.9.0
  • Python version: 3.6.1
  • Platform: UNIX
  • Can you reproduce the bug with PYTHONASYNCIODEBUG in env?: n.a.

Hi there,

I'm facing the same problem encountered in CPython issue #414. Since the patch (see issue #428) is not upstreamed in asyncio yet, I was wondering if there is any progess on this in uvloop. I've tested that behavior with the 0.9.0 version and it seems to be still blocking.

@1st1, have you planned to fix this? Do you need help at some point? (I was working with @martiusweb on the codebase which led to those issues)

soonum avatar Nov 27 '17 14:11 soonum

@1st1, have you planned to fix this?

Yes. Will take a look in a week or two. If you have time to make a PR I'll review it.

1st1 avatar Nov 28 '17 17:11 1st1

Actually, uvloop can block only if you supply a preexec_fn argument to loop.subprocess_shell() or loop.subprocess_exec(). Could you please share your test?

1st1 avatar May 25 '18 02:05 1st1

@soonum could you please share your test as requested by @1st1 ?

jlaine avatar Mar 16 '19 16:03 jlaine

Sorry I wasn't following up this old issue. In fact my use case is that one, I provide a preexec_fn argument to these methods and as a result, the loop is blocking.

soonum avatar Mar 18 '19 14:03 soonum

We have a component in Home Assistant; they call ffmpeg with a subprocess (async). They also end sometimes (less and more) in blocking and not working event loop :( I can't provide any test code because it's not all time simple to reproduce it in a laboratory situation.

The point is, we don't use preexec_fn and end in a blocking loop :(

pvizeli avatar Mar 22 '19 10:03 pvizeli

The problem is forking. It looks like the current implementation doesn't work on a heavy load of the main process.

pvizeli avatar Apr 25 '19 19:04 pvizeli

IIRC, the call to fork() can be slow because of the internals of the memory management (I think it's in O(n) with the number of memory pages of the forked process). Usually the workaround is to use vfork() when knowing that exec() will be called right after, but I don't think that this option is safe in Python.

I think that the problem is more about a blocking IPC used to synchronize the parent and the child to transmit failures in preexec_fn. I made a PR that got lost for cpython with vanilla asyncio years ago for this problem: https://github.com/python/asyncio/pull/428

Cheers

Le jeu. 25 avr. 2019 à 21:46, Pascal Vizeli [email protected] a écrit :

The problem is forking. It looks like the current implementation doesn't work on a heavy load of the main process.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MagicStack/uvloop/issues/118#issuecomment-486812726, or mute the thread https://github.com/notifications/unsubscribe-auth/AABPWGGLH3YZJGF6VZRNQE3PSIDAVANCNFSM4EFO3FLQ .

-- Martin http://www.martiusweb.net Richard www.martiusweb.net

Martiusweb avatar Apr 26 '19 16:04 Martiusweb

The point is, if we use subprocess inside executors, it work without issues. Also call the normal subprocess inside loop work with attach a streamreader to output. But the subprocess from uvloop hungs

pvizeli avatar Apr 30 '19 22:04 pvizeli

any news to this? the HA community is waiting for a solution wistfully ;)

nicx avatar Jun 09 '19 12:06 nicx

Hm very sad that its not going on here :( I would love using the HA stream component... @1st1 any plan about this?

nicx avatar Aug 06 '19 07:08 nicx

@1st1 any plan about this?

I still haven't received any clear answer to https://github.com/MagicStack/uvloop/issues/118#issuecomment-391923149.

I know that uvloop is blocking when preexec_fn is used, but almost nobody uses it. Other than that, as @Martiusweb explained in his comment https://github.com/MagicStack/uvloop/issues/118#issuecomment-487112929 it should not block.

@pvizeli

But the subprocess from uvloop hungs

Does uvloop behave differently from asyncio in your case?

1st1 avatar Oct 22 '19 20:10 1st1

@1st1 yes. The reason why we remove it from Home Assistant. However, https://github.com/MagicStack/uvloop/pull/268 fix the problem and we can reuse it after that is merged.

pvizeli avatar Oct 22 '19 20:10 pvizeli

Great, I'm working on a release right now, and will likely merge #268 or a variation of it.

1st1 avatar Oct 22 '19 21:10 1st1

  • uvloop version: 0.14.0

Hi there,

I got some details after encountering another blocking case. I found that if you provide a value for the keyword argument cwd and experiencing a slow filesystem at runtime, the call to loop.subprocess_[exec|shell]() would block.

Here is a simple test to reproduce the behavior. It's relying on slowfs.

  1. create a directory /home_slow
  2. run the following command <path_to>/slowfs/env/bin/slowfs -c <path_to>/slowfs/slowfs/slowfs.cfg --allow-other /home /home_slow/
  3. in another shell, run the python (tested with Python v3.8.0) example subprocess_test.py

You should see that Python is blocking during 5 seconds. This is related to the delay enforced by slowfs.

Here is the content of subprocess_test.py:

import asyncio
import datetime
import subprocess
import uvloop

async def subprocess_task():
    await asyncio.sleep(2)
    process = await asyncio.create_subprocess_shell("touch a_file.txt" , cwd="/home_slow/",) 
    await process.wait()
    print("Process terminated with code", process.returncode, flush=True)


async def print_task():
    """
    Print seconds elapsed since beginning of the program at 1Hz.
    """
    start_time = datetime.datetime.now()
    while True:
        now = datetime.datetime.now()
        print(round((now - start_time).total_seconds()), flush=True)
        await asyncio.sleep(1)


if __name__ == "__main__":
    asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
    loop = asyncio.get_event_loop()
    task_1 = loop.create_task(print_task())
    task_2 = loop.create_task(subprocess_task())
    
    try:
        loop.run_forever()
    except KeyboardInterrupt:
        task_1.cancel()
        task_2.cancel()
        loop.close()

Here's the config used in slowfs.cfg:

access = 5
chmod = 0
chown = 0
getattr = 5
readdir = 5
readlink = 0
mknod = 0
rmdir = 0
mkdir = 0
statfs = 5
unlink = 0
symlink = 0
rename = 0
link = 0
utimens = 0
open = 0
create = 0
read = 0
write = 0
truncate = 0
flush = 0
release = 0
fsync = 0

By the way, #268 doesn't solve the issue regarding using a preexec function. The way I tested it is by defining a function that does a simple time.sleep(5) to see the loop blocking.

Hope that could help to slove this issue.

soonum avatar Dec 16 '19 10:12 soonum

hi,

would be concurrent.futures import ProcessPoolExecutor a workaround?

madkote avatar Mar 31 '20 18:03 madkote

Here is a simple test to reproduce the behavior. It's relying on slowfs.

Nice, that's great! Thanks for the slowfs tip, I didn't know about it before.

would be concurrent.futures import ProcessPoolExecutor a workaround?

Unfortunately no, they would still block.

1st1 avatar Apr 01 '20 02:04 1st1

@1st1 thanks for reply. So the fix would be at the moment to make subprocess async -> run subprocess in thread (with some async flavour).

`would be concurrent.futures import ProcessPoolExecutor a workaround?

Unfortunately no, they would still block.`

madkote avatar Apr 01 '20 08:04 madkote

@1st1 here is possible fix

def run_subprocess(cmd, loop=None):
    loop = loop or asyncio.get_event_loop()
    try:
        process = await asyncio.create_subprocess_shell(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    except NotImplementedError:
        with subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) as process:
            try:
                result = await loop.run_in_executor(None, process.communicate)
            except Exception:  # muh pycodestyle
                def kill():
                    process.kill()
                    process.wait()

                await loop.run_in_executor(None, kill)
                raise

    else:
        result = await process.communicate()

    return [res.decode('utf-8') for res in result] ```

madkote avatar Apr 01 '20 08:04 madkote

OK, this turns out to be an issue in libuv.

libuv created a pipe (marked close-on-exec) for the parent process to wait for the execve() in the child, using a blocking read() to wait for the EOF. However in the child, it has to execute chdir() before the execve(), therefore the parent process / event loop is blocked.

At the same time, CPython subprocess (used by asyncio) seems to have a similar design and the same issue. It creates an errpipe, uses a blocking read() to wait on it, and the subprocess also chdir() before execve().

fantix avatar Feb 09 '21 17:02 fantix

I don't know if it's related to this issue, but I may have another case that might help shed light on this. I haven't got it reduced to a self-contained reproducible example yet, but I could probably do that with only an hour of work if it would help.

In my situation, under Python 3.8.0 (custom ARM v7 build, nothing special about it), with either uvloop 0.14.0 or 0.15.3, it appears if I cancel() a task that is launching a subprocess via asyncio.create_subprocess_exec(*cmd, close_fds=True, stdin/out/err=DEVNULL), I can end up with the entire event loop blocked (i.e. stuck). There's a thread (from the watcher, I assume) still running, with /proc/PID/wchan saying it's blocked on a kernel call to futex_wait_queue_me, but the actual subprocess has already exited, or conceivably never got launched. The last thing I see is a line of debug logging from immediately before the exec call.

I've got a stress test loop running that triggers the task that does this, cancelling it a random amount of time later, from 0.001s to a few seconds. I haven't yet narrowed down exactly when this occurs, but so far it's only when the create_subprocess_exec() call has not yet returned. If it manages to return the Process object, it never blocks (and the following debug log output shows up).

Update/edit: it appears possible the fact I'm sometimes cancelling just as the subprocess launches is not part of the problem. I'm continuing to try reducing this to its simplest form...

peter9477 avatar Jul 23 '21 21:07 peter9477