sys.exit() caught in async event loop exception handler
| BPO | 25489 |
|---|---|
| Nosy | @gvanrossum, @bitdancer, @asvetlov, @1st1, @iritkatriel |
| Files |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
assignee = None
closed_at = None
created_at = <Date 2015-10-27.11:25:55.941>
labels = ['expert-asyncio']
title = 'sys.exit() caught in async event loop exception handler'
updated_at = <Date 2022-03-22.13:11:35.019>
user = 'https://bugs.python.org/jinty'
bugs.python.org fields:
activity = <Date 2022-03-22.13:11:35.019>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['asyncio']
creation = <Date 2015-10-27.11:25:55.941>
creator = 'jinty'
dependencies = []
files = ['40867', '40869']
hgrepos = []
issue_num = 25489
keywords = ['patch']
message_count = 27.0
messages = ['253529', '253540', '253543', '253544', '253549', '253552', '253554', '253555', '253556', '253557', '253558', '253576', '253594', '253601', '253603', '253686', '253723', '253746', '415599', '415618', '415619', '415620', '415621', '415622', '415737', '415738', '415742']
nosy_count = 6.0
nosy_names = ['gvanrossum', 'r.david.murray', 'asvetlov', 'yselivanov', 'jinty', 'iritkatriel']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue25489'
versions = ['Python 3.5']
Running the attached file with python3 shows that SystemExit is caught rather than causing the process to stop. That's quite surprising.
I expected this function to be parallel to sys.excepthook, but I see that sys.excepthook does not get called if you call sys.exit() at the python prompt. So I guess I agree with you that it is surprising, although I actually expected the opposite (that sys.excepthook would get called for sys.exit).
the workaround I am using at the moment is this:
def handler(loop, context):
print('Got error, exiting')
loop.call_soon(sys.exit, 42)
which actually does cause the process to exit
Ah, I misunderstood your report, because I didn't actually run the example. The exception is being ignored because it is raised during a __del__ method execution. This has nothing to do with set_exception_handler. And in fact if you raise sys.exit in an excepthook handler it is ignored completely, so the cases are parallel in that sense. Interestingly, if you replace your Exception in boom with sys.exit, your sys.exit in the handler actually works, because in that case the handler isn't getting called from the Task __del__.
I don't think there's a bug in python here, I think the bug is in your program. You aren't yielding from the task, so the exception from boom only gets dealt with during Task finalization. I find this to be the most confusing part of asyncio programming, myself. I don't know why the handler doesn't get called just from creating the Task and running the loop. I often end up writing wrapper functions that yield from a function that does the work inside a try/except, and/or yield from the task in a 'wait_closed' method.
Maybe someone with more asyncio inside knowledge will be able to clear this up, and we can figure out a doc improvement. I suspect we need a "best practices" document about this...
because in that case the handler isn't getting called from the Task __del__
It's possible to fix -- see the attached future.patch. And perhaps this should be fixed in Future.__del__ and Task.__del__ -- ignoring BaseExceptions isn't good.
So, any exception raised in the exception handler will be re-raised via call_soon. I think the message would be clearer if it said that (that the exception comes from the registered exception handler).
But, I'm not sure this is a good idea. Exceptions are ignored in __del__ because they are asynchronous to the currently running code when the exception is executed (because they are triggered by garbage collection)...that is, there's no currently active statement when the exception is raised. Just because asyncio is an async framework doesn't really change this fundamental truth (between explicit yield points, asyncio code is synchronous, that's its big attraction).
Making this change would make asyncio inconsistent with python's normal practice, and I don't (yet?) see a coherent motivation for doing so.
I see your points, but we're talking about BaseExceptions here -- KeyboardInterrupt, SystemExit etc. Those things usually mean that the program has to crash.
Trapping those exceptions in __del__ isn't good.
Another way to address this would be to at least modify call_exception_handler to log when an exception handler raises a BaseException error.
No, you are talking about *all* exceptions, since they all descend from BaseException. Python's normal __del__ handling just prints the ignored exception to stdout, even if it is a BaseException. Why should asyncio be different (other than logging it instead, which would be useful)?
No, you are talking about *all* exceptions, since they all descend from BaseException. Python's normal __del__ handling just prints the ignored exception to stdout, even if it is a BaseException. Why should asyncio be different (other than logging it instead, which would be useful)?
My bad -- my current patch does indeed re-raise all exceptions. But would you be OK with something like this:
def __del__():
...
try:
self._loop.call_exception_handler(context)
except Exception as ex:
# log or re-raise
logger.log(...)
except BaseException as ex:
logger.error(
'BaseException in Future.__del__, will be re-raised soon',
exc_info=True)
def throw():
raise ex
self._loop.call_soon(throw)
Not until you convince me there is a reason for deviating from Python's normal __del__ handling :) (Or other asyncio developers agree with you and not me.)
Whew. Complex issue!
The OP should probably use his workaround or call loop.stop() instead of raising SystemExit.
Asyncio in general is rather careless about "true" BaseExceptions (i.e. that aren't also Exceptions), we should decide what we want to do for these (there are definitely ways to get in a bad state if you catch one and then resume the loop).
call_exception_handler() tries to log and then ignore exceptions raised by the handler, so the except clause Yuri's patch adds to __del__ in fact will only see true BaseExceptions.
Adding an except clause to __del__ seems very unprincipled -- there's another __del__ that would require the same treatment (all our __del__ methods seem to call call_exception_handler()) and *if* we want to catch it we should probably do it in call_exception_handler().
boom() does in fact yield.
Calling loop.stop() means that I need other, more complex code, to store and return the non-zero exit status.
Using sys.exit means you are depending on garbage collection to clean up all of your program's resources. In the general case this is a bad idea. A better design is to call loop.stop, and then do cleanup (which might involve calling some wait_closed functions via loop.run_until_complete). If you just call sys.exit, your resources may not get cleaned up correctly, or may not get cleaned up correctly somewhat randomly due to the indeterminacies in the order in which garbage collection is done. This may not matter for a simple program, but I find it makes it easier for me if I just do it "the right way" always :)
How about we extend loop.stop() so that you can pass it an exception to raise once the loop is stopped? This exception would then be thrown out of run_forever(). There may be some delay (callbacks already scheduled will run first) but it is how things were meant to be.
FWIW this isn't really enough to ensure cleanup happens before destructors run -- when the loop exits, tasks may still be active unless you keep track of all of them and explicitly cancel them (and run the loop until they have processed the cancellation).
On Wed, Oct 28, 2015 at 02:49:55PM +0000, R. David Murray wrote:
R. David Murray added the comment:
Using sys.exit means you are depending on garbage collection to clean up all of your program's resources. In the general case this is a bad idea. A better design is to call loop.stop, and then do cleanup (which might involve calling some wait_closed functions via loop.run_until_complete). If you just call sys.exit, your resources may not get cleaned up correctly, or may not get cleaned up correctly somewhat randomly due to the indeterminacies in the order in which garbage collection is done. This may not matter for a simple program, but I find it makes it easier for me if I just do it "the right way" always :)
I think it depends on the problem, sometimes "crash-only" software is safer. You have to design for immediate failure anyway, can't stop those pesky hardware failures. So if you always immediately crash there is only one, well-tested path to stopping a program.
Also saves a lot of cleanup code writing;)
Hmm, I suppose that means I should really be using os._exit(42) to avoid garbage collection...
On Wed, Oct 28, 2015 at 03:32:36PM +0000, Guido van Rossum wrote:
Guido van Rossum added the comment:
How about we extend loop.stop() so that you can pass it an exception to raise once the loop is stopped? This exception would then be thrown out of run_forever(). There may be some delay (callbacks already scheduled will run first) but it is how things were meant to be.
It is better than the current situation where to call loop.stop() and have the correct exit code I would need to resort to a global variable.
I would then probably try to write an exception handler like this:
def exception_handler(loop, context):
loop.stop(context.get('exception', Exception('unknown error')))
And then hope I don't lose the traceback!
FWIW this isn't really enough to ensure cleanup happens before destructors run -- when the loop exits, tasks may still be active unless you keep track of all of them and explicitly cancel them (and run the loop until they have processed the cancellation).
At least in my situation I don't think this is a problem, the objective is to shut down the process quickly to prevent bad things happening.
OK, I'll wait for someone to submit a patch for this. (I tried a quick hack, but unless we limit this to BaseException instances that aren't also Exception instances it's complicated).
See also bpo-46759.
Andrew, would you be interested in investigating this? I can't even follow the flow through asyncio that causes the observed behavior (though I seem to have confirmed it).
Not sure if the issue is still relevant. asyncio re-raises KeyboardInterrupt explicitly in all loop.call_exception_handler() branches: https://github.com/python/cpython/blob/main/Lib/asyncio/base_events.py#L1759-L1814
The change exists since Python 3.8 (October of 2019) The issue discussion belongs to 2015
Nevertheless, the example code still hangs after calling sys.exit(). I can't quite tell where it is hanging.
With python built from main I get:
/Users/guido/test_sys_exit_in_exception_handler.py:12: DeprecationWarning: There is no current event loop
loop = asyncio.get_event_loop()
Got error, exiting
Exception ignored in: <bound method BaseEventLoop.call_exception_handler of <_UnixSelectorEventLoop running=True closed=False debug=False>>
Traceback (most recent call last):
File "/Users/guido/cpython/Lib/asyncio/base_events.py", line 1749, in call_exception_handler
self._exception_handler(self, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/guido/test_sys_exit_in_exception_handler.py", line 6, in handler
sys.exit(42)
^^^^^^^^^^^^
SystemExit: 42
This is printed from write_unraisable_exc_file(). Earlier comments in this issue indicate that the exception is being raised from a finalizer (__del__ method). But how?
I can describe what happens with test_sys_exit_in_exception_handler.py
- The 'boom' task raises an exception.
- The task is not awaited,
Future.__del__calls the exception handler with 'Task exception was never retrieved' message. - The custom handler raises SystemExit.
- SystemExit bubbles up and swallowed by
__del__, the__del__method cannot re-raise.
The question is: what is the behavior expected?
a) Now an exception raised by a custom exception handler is swallowed in this particular case (but is propagated if loop.call_exception_handler() is called from a function other than __del__).
b) Yuri suggested re-schedule an exception generated by loop.call_exception_handler by loop.call_soon(). asyncio.Handle catches it and... calls call_exception_handler() with 'Exception in callback ...' message. At the end, we have an endless recursion.
c) asyncio loop can call loop.stop() if an exception is raised by loop.call_exception_handler() from __del__. I think this behavior is terrible: a subtle error can terminate asyncio program.
d) Assume that a custom exception handler should not raise an exception. Catch all exceptions in call_exception_handler, call sys.unraisablehook(), and suppress the exception.
I believe that d) is the best thing that we can do here.
I can prepare a fix if we agree on the solution.
Guido, perhaps you had problems with the problem detection because the asyncio uses _asyncio C Extesions by default. It drops some calls from the python stack trace.
Andrew, thanks for explaining this.
The key thing I was missing was that the root cause of the problem is that Future.__del__ is trying to log an error about the un-awaited task by calling the exception handler directly. That actually feels a little dodgy.
This is why I'm not yet comfortable with (d). Looking at call_exception_handler(), whether it calls the default handler or a custom handler, it explicitly checks for SystemExit and KeyboardInterrupt and re-raises those. And only those -- everything ends up logging an error.
Which makes me wonder. Maybe that error in Future.__del__ should not call any exception handler at all, but just call logger.error()? Or maybe Future.__del__ should catch exceptions coming out of there and log an error? Maybe a modified version of (d), but only implemented in Future.__del__, not in call_exception_handler()?
The simplest solution is to just let the loop call the exception handler with call_soon rather than calling it manually.
Isn't that (b)? @asvetlov didn't seem to think that would work (see his message above).
No, I propose to call exception handler with call_soon rather than (b) which is about rescheduling the raised error with call_soon.