cpython sys.exit() caught in async event loop exception handler

BPO 25489

Nosy @gvanrossum, @bitdancer, @asvetlov, @1st1, @iritkatriel

Files
test_sys_exit_in_exception_handler.py: Test Case
future.patch

BPO	25489
Nosy	@gvanrossum, @bitdancer, @asvetlov, @1st1, @iritkatriel
Files	test_sys_exit_in_exception_handler.py: Test Case future.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2015-10-27.11:25:55.941>
labels = ['expert-asyncio']
title = 'sys.exit() caught in async event loop exception handler'
updated_at = <Date 2022-03-22.13:11:35.019>
user = 'https://bugs.python.org/jinty'

bugs.python.org fields:

activity = <Date 2022-03-22.13:11:35.019>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['asyncio']
creation = <Date 2015-10-27.11:25:55.941>
creator = 'jinty'
dependencies = []
files = ['40867', '40869']
hgrepos = []
issue_num = 25489
keywords = ['patch']
message_count = 27.0
messages = ['253529', '253540', '253543', '253544', '253549', '253552', '253554', '253555', '253556', '253557', '253558', '253576', '253594', '253601', '253603', '253686', '253723', '253746', '415599', '415618', '415619', '415620', '415621', '415622', '415737', '415738', '415742']
nosy_count = 6.0
nosy_names = ['gvanrossum', 'r.david.murray', 'asvetlov', 'yselivanov', 'jinty', 'iritkatriel']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue25489'
versions = ['Python 3.5']

Oct 27 '15 11:10 be3361d6-08f5-4239-bc8c-ef6e1846e3fe

Running the attached file with python3 shows that SystemExit is caught rather than causing the process to stop. That's quite surprising.

Oct 27 '15 11:10 be3361d6-08f5-4239-bc8c-ef6e1846e3fe

I expected this function to be parallel to sys.excepthook, but I see that sys.excepthook does not get called if you call sys.exit() at the python prompt. So I guess I agree with you that it is surprising, although I actually expected the opposite (that sys.excepthook would get called for sys.exit).

Oct 27 '15 14:10 bitdancer

the workaround I am using at the moment is this:

def handler(loop, context):
    print('Got error, exiting')
    loop.call_soon(sys.exit, 42)

which actually does cause the process to exit

Oct 27 '15 14:10 be3361d6-08f5-4239-bc8c-ef6e1846e3fe

Ah, I misunderstood your report, because I didn't actually run the example. The exception is being ignored because it is raised during a __del__ method execution. This has nothing to do with set_exception_handler. And in fact if you raise sys.exit in an excepthook handler it is ignored completely, so the cases are parallel in that sense. Interestingly, if you replace your Exception in boom with sys.exit, your sys.exit in the handler actually works, because in that case the handler isn't getting called from the Task __del__.

I don't think there's a bug in python here, I think the bug is in your program. You aren't yielding from the task, so the exception from boom only gets dealt with during Task finalization. I find this to be the most confusing part of asyncio programming, myself. I don't know why the handler doesn't get called just from creating the Task and running the loop. I often end up writing wrapper functions that yield from a function that does the work inside a try/except, and/or yield from the task in a 'wait_closed' method.

Maybe someone with more asyncio inside knowledge will be able to clear this up, and we can figure out a doc improvement. I suspect we need a "best practices" document about this...

Oct 27 '15 14:10 bitdancer

because in that case the handler isn't getting called from the Task __del__

It's possible to fix -- see the attached future.patch. And perhaps this should be fixed in Future.__del__ and Task.__del__ -- ignoring BaseExceptions isn't good.

Oct 27 '15 16:10 1st1

So, any exception raised in the exception handler will be re-raised via call_soon. I think the message would be clearer if it said that (that the exception comes from the registered exception handler).

But, I'm not sure this is a good idea. Exceptions are ignored in __del__ because they are asynchronous to the currently running code when the exception is executed (because they are triggered by garbage collection)...that is, there's no currently active statement when the exception is raised. Just because asyncio is an async framework doesn't really change this fundamental truth (between explicit yield points, asyncio code is synchronous, that's its big attraction).

Making this change would make asyncio inconsistent with python's normal practice, and I don't (yet?) see a coherent motivation for doing so.

Oct 27 '15 16:10 bitdancer

I see your points, but we're talking about BaseExceptions here -- KeyboardInterrupt, SystemExit etc. Those things usually mean that the program has to crash.

Oct 27 '15 17:10 1st1

Trapping those exceptions in __del__ isn't good.

Another way to address this would be to at least modify call_exception_handler to log when an exception handler raises a BaseException error.

Oct 27 '15 17:10 1st1

No, you are talking about *all* exceptions, since they all descend from BaseException. Python's normal __del__ handling just prints the ignored exception to stdout, even if it is a BaseException. Why should asyncio be different (other than logging it instead, which would be useful)?

Oct 27 '15 17:10 bitdancer

No, you are talking about *all* exceptions, since they all descend from BaseException. Python's normal __del__ handling just prints the ignored exception to stdout, even if it is a BaseException. Why should asyncio be different (other than logging it instead, which would be useful)?

My bad -- my current patch does indeed re-raise all exceptions. But would you be OK with something like this:

        def __del__():
            ...

        try:
            self._loop.call_exception_handler(context)
        except Exception as ex:
            # log or re-raise
            logger.log(...)
        except BaseException as ex:
            logger.error(
                'BaseException in Future.__del__, will be re-raised soon',
                exc_info=True)
            def throw():
                raise ex
            self._loop.call_soon(throw)

Oct 27 '15 17:10 1st1

Not until you convince me there is a reason for deviating from Python's normal __del__ handling :) (Or other asyncio developers agree with you and not me.)

Oct 27 '15 17:10 bitdancer

Whew. Complex issue!

The OP should probably use his workaround or call loop.stop() instead of raising SystemExit.

Asyncio in general is rather careless about "true" BaseExceptions (i.e. that aren't also Exceptions), we should decide what we want to do for these (there are definitely ways to get in a bad state if you catch one and then resume the loop).

call_exception_handler() tries to log and then ignore exceptions raised by the handler, so the except clause Yuri's patch adds to __del__ in fact will only see true BaseExceptions.

Adding an except clause to __del__ seems very unprincipled -- there's another __del__ that would require the same treatment (all our __del__ methods seem to call call_exception_handler()) and *if* we want to catch it we should probably do it in call_exception_handler().

boom() does in fact yield.

Oct 28 '15 00:10 gvanrossum

Calling loop.stop() means that I need other, more complex code, to store and return the non-zero exit status.

Oct 28 '15 09:10 be3361d6-08f5-4239-bc8c-ef6e1846e3fe

Using sys.exit means you are depending on garbage collection to clean up all of your program's resources. In the general case this is a bad idea. A better design is to call loop.stop, and then do cleanup (which might involve calling some wait_closed functions via loop.run_until_complete). If you just call sys.exit, your resources may not get cleaned up correctly, or may not get cleaned up correctly somewhat randomly due to the indeterminacies in the order in which garbage collection is done. This may not matter for a simple program, but I find it makes it easier for me if I just do it "the right way" always :)

Oct 28 '15 14:10 bitdancer

How about we extend loop.stop() so that you can pass it an exception to raise once the loop is stopped? This exception would then be thrown out of run_forever(). There may be some delay (callbacks already scheduled will run first) but it is how things were meant to be.

FWIW this isn't really enough to ensure cleanup happens before destructors run -- when the loop exits, tasks may still be active unless you keep track of all of them and explicitly cancel them (and run the loop until they have processed the cancellation).

Oct 28 '15 15:10 gvanrossum

On Wed, Oct 28, 2015 at 02:49:55PM +0000, R. David Murray wrote:

R. David Murray added the comment:

Using sys.exit means you are depending on garbage collection to clean up all of your program's resources. In the general case this is a bad idea. A better design is to call loop.stop, and then do cleanup (which might involve calling some wait_closed functions via loop.run_until_complete). If you just call sys.exit, your resources may not get cleaned up correctly, or may not get cleaned up correctly somewhat randomly due to the indeterminacies in the order in which garbage collection is done. This may not matter for a simple program, but I find it makes it easier for me if I just do it "the right way" always :)

I think it depends on the problem, sometimes "crash-only" software is safer. You have to design for immediate failure anyway, can't stop those pesky hardware failures. So if you always immediately crash there is only one, well-tested path to stopping a program.

Also saves a lot of cleanup code writing;)

Hmm, I suppose that means I should really be using os._exit(42) to avoid garbage collection...

Oct 29 '15 17:10 be3361d6-08f5-4239-bc8c-ef6e1846e3fe

On Wed, Oct 28, 2015 at 03:32:36PM +0000, Guido van Rossum wrote:

Guido van Rossum added the comment:

How about we extend loop.stop() so that you can pass it an exception to raise once the loop is stopped? This exception would then be thrown out of run_forever(). There may be some delay (callbacks already scheduled will run first) but it is how things were meant to be.

It is better than the current situation where to call loop.stop() and have the correct exit code I would need to resort to a global variable.

I would then probably try to write an exception handler like this:

    def exception_handler(loop, context):
        loop.stop(context.get('exception', Exception('unknown error')))

And then hope I don't lose the traceback!

FWIW this isn't really enough to ensure cleanup happens before destructors run -- when the loop exits, tasks may still be active unless you keep track of all of them and explicitly cancel them (and run the loop until they have processed the cancellation).

At least in my situation I don't think this is a problem, the objective is to shut down the process quickly to prevent bad things happening.

Oct 30 '15 09:10 be3361d6-08f5-4239-bc8c-ef6e1846e3fe

OK, I'll wait for someone to submit a patch for this. (I tried a quick hack, but unless we limit this to BaseException instances that aren't also Exception instances it's complicated).

Oct 30 '15 16:10 gvanrossum

See also bpo-46759.

Mar 20 '22 09:03 iritkatriel

Andrew, would you be interested in investigating this? I can't even follow the flow through asyncio that causes the observed behavior (though I seem to have confirmed it).

Mar 20 '22 16:03 gvanrossum

Not sure if the issue is still relevant. asyncio re-raises KeyboardInterrupt explicitly in all loop.call_exception_handler() branches: https://github.com/python/cpython/blob/main/Lib/asyncio/base_events.py#L1759-L1814

Mar 20 '22 16:03 asvetlov

The change exists since Python 3.8 (October of 2019) The issue discussion belongs to 2015

Mar 20 '22 16:03 asvetlov

Nevertheless, the example code still hangs after calling sys.exit(). I can't quite tell where it is hanging.

Mar 20 '22 16:03 gvanrossum

With python built from main I get:

/Users/guido/test_sys_exit_in_exception_handler.py:12: DeprecationWarning: There is no current event loop
  loop = asyncio.get_event_loop()
Got error, exiting
Exception ignored in: <bound method BaseEventLoop.call_exception_handler of <_UnixSelectorEventLoop running=True closed=False debug=False>>
Traceback (most recent call last):
  File "/Users/guido/cpython/Lib/asyncio/base_events.py", line 1749, in call_exception_handler
    self._exception_handler(self, context)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/guido/test_sys_exit_in_exception_handler.py", line 6, in handler
    sys.exit(42)
    ^^^^^^^^^^^^
SystemExit: 42

This is printed from write_unraisable_exc_file(). Earlier comments in this issue indicate that the exception is being raised from a finalizer (__del__ method). But how?

Mar 20 '22 16:03 gvanrossum

I can describe what happens with test_sys_exit_in_exception_handler.py

The 'boom' task raises an exception.
The task is not awaited, Future.__del__ calls the exception handler with 'Task exception was never retrieved' message.
The custom handler raises SystemExit.
SystemExit bubbles up and swallowed by __del__, the __del__ method cannot re-raise.

The question is: what is the behavior expected? a) Now an exception raised by a custom exception handler is swallowed in this particular case (but is propagated if loop.call_exception_handler() is called from a function other than __del__). b) Yuri suggested re-schedule an exception generated by loop.call_exception_handler by loop.call_soon(). asyncio.Handle catches it and... calls call_exception_handler() with 'Exception in callback ...' message. At the end, we have an endless recursion. c) asyncio loop can call loop.stop() if an exception is raised by loop.call_exception_handler() from __del__. I think this behavior is terrible: a subtle error can terminate asyncio program. d) Assume that a custom exception handler should not raise an exception. Catch all exceptions in call_exception_handler, call sys.unraisablehook(), and suppress the exception.

I believe that d) is the best thing that we can do here.

I can prepare a fix if we agree on the solution.

Mar 22 '22 01:03 asvetlov

Guido, perhaps you had problems with the problem detection because the asyncio uses _asyncio C Extesions by default. It drops some calls from the python stack trace.

Mar 22 '22 01:03 asvetlov

Andrew, thanks for explaining this.

The key thing I was missing was that the root cause of the problem is that Future.__del__ is trying to log an error about the un-awaited task by calling the exception handler directly. That actually feels a little dodgy.

This is why I'm not yet comfortable with (d). Looking at call_exception_handler(), whether it calls the default handler or a custom handler, it explicitly checks for SystemExit and KeyboardInterrupt and re-raises those. And only those -- everything ends up logging an error.

Which makes me wonder. Maybe that error in Future.__del__ should not call any exception handler at all, but just call logger.error()? Or maybe Future.__del__ should catch exceptions coming out of there and log an error? Maybe a modified version of (d), but only implemented in Future.__del__, not in call_exception_handler()?

Mar 22 '22 04:03 gvanrossum

The simplest solution is to just let the loop call the exception handler with call_soon rather than calling it manually.

Aug 16 '22 13:08 kumaraditya303

Isn't that (b)? @asvetlov didn't seem to think that would work (see his message above).

Aug 16 '22 17:08 gvanrossum

No, I propose to call exception handler with call_soon rather than (b) which is about rescheduling the raised error with call_soon.

Aug 16 '22 17:08 kumaraditya303