cpython icon indicating copy to clipboard operation
cpython copied to clipboard

sys.exit() caught in async event loop exception handler

Open be3361d6-08f5-4239-bc8c-ef6e1846e3fe opened this issue 10 years ago • 31 comments

BPO 25489
Nosy @gvanrossum, @bitdancer, @asvetlov, @1st1, @iritkatriel
Files
  • test_sys_exit_in_exception_handler.py: Test Case
  • future.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-10-27.11:25:55.941>
    labels = ['expert-asyncio']
    title = 'sys.exit() caught in async event loop exception handler'
    updated_at = <Date 2022-03-22.13:11:35.019>
    user = 'https://bugs.python.org/jinty'
    

    bugs.python.org fields:

    activity = <Date 2022-03-22.13:11:35.019>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['asyncio']
    creation = <Date 2015-10-27.11:25:55.941>
    creator = 'jinty'
    dependencies = []
    files = ['40867', '40869']
    hgrepos = []
    issue_num = 25489
    keywords = ['patch']
    message_count = 27.0
    messages = ['253529', '253540', '253543', '253544', '253549', '253552', '253554', '253555', '253556', '253557', '253558', '253576', '253594', '253601', '253603', '253686', '253723', '253746', '415599', '415618', '415619', '415620', '415621', '415622', '415737', '415738', '415742']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'r.david.murray', 'asvetlov', 'yselivanov', 'jinty', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue25489'
    versions = ['Python 3.5']
    

    Running the attached file with python3 shows that SystemExit is caught rather than causing the process to stop. That's quite surprising.

    I expected this function to be parallel to sys.excepthook, but I see that sys.excepthook does not get called if you call sys.exit() at the python prompt. So I guess I agree with you that it is surprising, although I actually expected the opposite (that sys.excepthook would get called for sys.exit).

    bitdancer avatar Oct 27 '15 14:10 bitdancer

    the workaround I am using at the moment is this:

    def handler(loop, context):
        print('Got error, exiting')
        loop.call_soon(sys.exit, 42)
    

    which actually does cause the process to exit

    Ah, I misunderstood your report, because I didn't actually run the example. The exception is being ignored because it is raised during a __del__ method execution. This has nothing to do with set_exception_handler. And in fact if you raise sys.exit in an excepthook handler it is ignored completely, so the cases are parallel in that sense. Interestingly, if you replace your Exception in boom with sys.exit, your sys.exit in the handler actually works, because in that case the handler isn't getting called from the Task __del__.

    I don't think there's a bug in python here, I think the bug is in your program. You aren't yielding from the task, so the exception from boom only gets dealt with during Task finalization. I find this to be the most confusing part of asyncio programming, myself. I don't know why the handler doesn't get called just from creating the Task and running the loop. I often end up writing wrapper functions that yield from a function that does the work inside a try/except, and/or yield from the task in a 'wait_closed' method.

    Maybe someone with more asyncio inside knowledge will be able to clear this up, and we can figure out a doc improvement. I suspect we need a "best practices" document about this...

    bitdancer avatar Oct 27 '15 14:10 bitdancer

    because in that case the handler isn't getting called from the Task __del__

    It's possible to fix -- see the attached future.patch. And perhaps this should be fixed in Future.__del__ and Task.__del__ -- ignoring BaseExceptions isn't good.

    1st1 avatar Oct 27 '15 16:10 1st1

    So, any exception raised in the exception handler will be re-raised via call_soon. I think the message would be clearer if it said that (that the exception comes from the registered exception handler).

    But, I'm not sure this is a good idea. Exceptions are ignored in __del__ because they are asynchronous to the currently running code when the exception is executed (because they are triggered by garbage collection)...that is, there's no currently active statement when the exception is raised. Just because asyncio is an async framework doesn't really change this fundamental truth (between explicit yield points, asyncio code is synchronous, that's its big attraction).

    Making this change would make asyncio inconsistent with python's normal practice, and I don't (yet?) see a coherent motivation for doing so.

    bitdancer avatar Oct 27 '15 16:10 bitdancer

    I see your points, but we're talking about BaseExceptions here -- KeyboardInterrupt, SystemExit etc. Those things usually mean that the program has to crash.

    1st1 avatar Oct 27 '15 17:10 1st1

    Trapping those exceptions in __del__ isn't good.

    Another way to address this would be to at least modify call_exception_handler to log when an exception handler raises a BaseException error.

    1st1 avatar Oct 27 '15 17:10 1st1

    No, you are talking about *all* exceptions, since they all descend from BaseException. Python's normal __del__ handling just prints the ignored exception to stdout, even if it is a BaseException. Why should asyncio be different (other than logging it instead, which would be useful)?

    bitdancer avatar Oct 27 '15 17:10 bitdancer

    No, you are talking about *all* exceptions, since they all descend from BaseException. Python's normal __del__ handling just prints the ignored exception to stdout, even if it is a BaseException. Why should asyncio be different (other than logging it instead, which would be useful)?

    My bad -- my current patch does indeed re-raise all exceptions. But would you be OK with something like this:

            def __del__():
                ...
    
            try:
                self._loop.call_exception_handler(context)
            except Exception as ex:
                # log or re-raise
                logger.log(...)
            except BaseException as ex:
                logger.error(
                    'BaseException in Future.__del__, will be re-raised soon',
                    exc_info=True)
                def throw():
                    raise ex
                self._loop.call_soon(throw)
    

    1st1 avatar Oct 27 '15 17:10 1st1

    Not until you convince me there is a reason for deviating from Python's normal __del__ handling :) (Or other asyncio developers agree with you and not me.)

    bitdancer avatar Oct 27 '15 17:10 bitdancer

    Whew. Complex issue!

    The OP should probably use his workaround or call loop.stop() instead of raising SystemExit.

    Asyncio in general is rather careless about "true" BaseExceptions (i.e. that aren't also Exceptions), we should decide what we want to do for these (there are definitely ways to get in a bad state if you catch one and then resume the loop).

    call_exception_handler() tries to log and then ignore exceptions raised by the handler, so the except clause Yuri's patch adds to __del__ in fact will only see true BaseExceptions.

    Adding an except clause to __del__ seems very unprincipled -- there's another __del__ that would require the same treatment (all our __del__ methods seem to call call_exception_handler()) and *if* we want to catch it we should probably do it in call_exception_handler().

    boom() does in fact yield.

    gvanrossum avatar Oct 28 '15 00:10 gvanrossum

    Calling loop.stop() means that I need other, more complex code, to store and return the non-zero exit status.

    Using sys.exit means you are depending on garbage collection to clean up all of your program's resources. In the general case this is a bad idea. A better design is to call loop.stop, and then do cleanup (which might involve calling some wait_closed functions via loop.run_until_complete). If you just call sys.exit, your resources may not get cleaned up correctly, or may not get cleaned up correctly somewhat randomly due to the indeterminacies in the order in which garbage collection is done. This may not matter for a simple program, but I find it makes it easier for me if I just do it "the right way" always :)

    bitdancer avatar Oct 28 '15 14:10 bitdancer

    How about we extend loop.stop() so that you can pass it an exception to raise once the loop is stopped? This exception would then be thrown out of run_forever(). There may be some delay (callbacks already scheduled will run first) but it is how things were meant to be.

    FWIW this isn't really enough to ensure cleanup happens before destructors run -- when the loop exits, tasks may still be active unless you keep track of all of them and explicitly cancel them (and run the loop until they have processed the cancellation).

    gvanrossum avatar Oct 28 '15 15:10 gvanrossum

    On Wed, Oct 28, 2015 at 02:49:55PM +0000, R. David Murray wrote:

    R. David Murray added the comment:

    Using sys.exit means you are depending on garbage collection to clean up all of your program's resources. In the general case this is a bad idea. A better design is to call loop.stop, and then do cleanup (which might involve calling some wait_closed functions via loop.run_until_complete). If you just call sys.exit, your resources may not get cleaned up correctly, or may not get cleaned up correctly somewhat randomly due to the indeterminacies in the order in which garbage collection is done. This may not matter for a simple program, but I find it makes it easier for me if I just do it "the right way" always :)

    I think it depends on the problem, sometimes "crash-only" software is safer. You have to design for immediate failure anyway, can't stop those pesky hardware failures. So if you always immediately crash there is only one, well-tested path to stopping a program.

    Also saves a lot of cleanup code writing;)

    Hmm, I suppose that means I should really be using os._exit(42) to avoid garbage collection...

    On Wed, Oct 28, 2015 at 03:32:36PM +0000, Guido van Rossum wrote:

    Guido van Rossum added the comment:

    How about we extend loop.stop() so that you can pass it an exception to raise once the loop is stopped? This exception would then be thrown out of run_forever(). There may be some delay (callbacks already scheduled will run first) but it is how things were meant to be.

    It is better than the current situation where to call loop.stop() and have the correct exit code I would need to resort to a global variable.

    I would then probably try to write an exception handler like this:

        def exception_handler(loop, context):
            loop.stop(context.get('exception', Exception('unknown error')))
    

    And then hope I don't lose the traceback!

    FWIW this isn't really enough to ensure cleanup happens before destructors run -- when the loop exits, tasks may still be active unless you keep track of all of them and explicitly cancel them (and run the loop until they have processed the cancellation).

    At least in my situation I don't think this is a problem, the objective is to shut down the process quickly to prevent bad things happening.

    OK, I'll wait for someone to submit a patch for this. (I tried a quick hack, but unless we limit this to BaseException instances that aren't also Exception instances it's complicated).

    gvanrossum avatar Oct 30 '15 16:10 gvanrossum

    See also bpo-46759.

    iritkatriel avatar Mar 20 '22 09:03 iritkatriel

    Andrew, would you be interested in investigating this? I can't even follow the flow through asyncio that causes the observed behavior (though I seem to have confirmed it).

    gvanrossum avatar Mar 20 '22 16:03 gvanrossum

    Not sure if the issue is still relevant. asyncio re-raises KeyboardInterrupt explicitly in all loop.call_exception_handler() branches: https://github.com/python/cpython/blob/main/Lib/asyncio/base_events.py#L1759-L1814

    asvetlov avatar Mar 20 '22 16:03 asvetlov

    The change exists since Python 3.8 (October of 2019) The issue discussion belongs to 2015

    asvetlov avatar Mar 20 '22 16:03 asvetlov

    Nevertheless, the example code still hangs after calling sys.exit(). I can't quite tell where it is hanging.

    gvanrossum avatar Mar 20 '22 16:03 gvanrossum

    With python built from main I get:

    /Users/guido/test_sys_exit_in_exception_handler.py:12: DeprecationWarning: There is no current event loop
      loop = asyncio.get_event_loop()
    Got error, exiting
    Exception ignored in: <bound method BaseEventLoop.call_exception_handler of <_UnixSelectorEventLoop running=True closed=False debug=False>>
    Traceback (most recent call last):
      File "/Users/guido/cpython/Lib/asyncio/base_events.py", line 1749, in call_exception_handler
        self._exception_handler(self, context)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/guido/test_sys_exit_in_exception_handler.py", line 6, in handler
        sys.exit(42)
        ^^^^^^^^^^^^
    SystemExit: 42
    

    This is printed from write_unraisable_exc_file(). Earlier comments in this issue indicate that the exception is being raised from a finalizer (__del__ method). But how?

    gvanrossum avatar Mar 20 '22 16:03 gvanrossum

    I can describe what happens with test_sys_exit_in_exception_handler.py

    1. The 'boom' task raises an exception.
    2. The task is not awaited, Future.__del__ calls the exception handler with 'Task exception was never retrieved' message.
    3. The custom handler raises SystemExit.
    4. SystemExit bubbles up and swallowed by __del__, the __del__ method cannot re-raise.

    The question is: what is the behavior expected? a) Now an exception raised by a custom exception handler is swallowed in this particular case (but is propagated if loop.call_exception_handler() is called from a function other than __del__). b) Yuri suggested re-schedule an exception generated by loop.call_exception_handler by loop.call_soon(). asyncio.Handle catches it and... calls call_exception_handler() with 'Exception in callback ...' message. At the end, we have an endless recursion. c) asyncio loop can call loop.stop() if an exception is raised by loop.call_exception_handler() from __del__. I think this behavior is terrible: a subtle error can terminate asyncio program. d) Assume that a custom exception handler should not raise an exception. Catch all exceptions in call_exception_handler, call sys.unraisablehook(), and suppress the exception.

    I believe that d) is the best thing that we can do here.

    I can prepare a fix if we agree on the solution.

    asvetlov avatar Mar 22 '22 01:03 asvetlov

    Guido, perhaps you had problems with the problem detection because the asyncio uses _asyncio C Extesions by default. It drops some calls from the python stack trace.

    asvetlov avatar Mar 22 '22 01:03 asvetlov

    Andrew, thanks for explaining this.

    The key thing I was missing was that the root cause of the problem is that Future.__del__ is trying to log an error about the un-awaited task by calling the exception handler directly. That actually feels a little dodgy.

    This is why I'm not yet comfortable with (d). Looking at call_exception_handler(), whether it calls the default handler or a custom handler, it explicitly checks for SystemExit and KeyboardInterrupt and re-raises those. And only those -- everything ends up logging an error.

    Which makes me wonder. Maybe that error in Future.__del__ should not call any exception handler at all, but just call logger.error()? Or maybe Future.__del__ should catch exceptions coming out of there and log an error? Maybe a modified version of (d), but only implemented in Future.__del__, not in call_exception_handler()?

    gvanrossum avatar Mar 22 '22 04:03 gvanrossum

    The simplest solution is to just let the loop call the exception handler with call_soon rather than calling it manually.

    kumaraditya303 avatar Aug 16 '22 13:08 kumaraditya303

    Isn't that (b)? @asvetlov didn't seem to think that would work (see his message above).

    gvanrossum avatar Aug 16 '22 17:08 gvanrossum

    No, I propose to call exception handler with call_soon rather than (b) which is about rescheduling the raised error with call_soon.

    kumaraditya303 avatar Aug 16 '22 17:08 kumaraditya303