MultiError v2

Open njsmith opened this issue 5 years ago • 34 comments

MultiError is the one part of trio's core design that I'm really not satisfied with. Trying to support multiple exceptions on top of the language's firm one-exception-at-a-time stance raises (heh) all kinds of issues. Some of them can probably be fixed by changing the design. But the hardest problem is that there are lots of third-party packages that want to do custom handling of exception tracebacks (e.g. ipython, pytest, raven/sentry). And right now we have to monkeypatch all of them to work with MultiError, with more or less pain and success.

Now @1st1 wants to add nurseries to asyncio, and as part of that he'll need to add something like MultiError to the stdlib. We talked a bunch about this at PyCon, and the plan is to figure out how to do this in a way that works for both asyncio and trio. Probably we'll start by splitting MultiError off into a standalone library, that trio and uvloop can both consume, and then add that library to the stdlib in 3.8 (and then the library will remain as a backport library for those using trio on 3.7 and earlier). This way asyncio can build on our experience, and trio can get out of the monkeypatching business (because if MultiErrors are in the stdlib, then it becomes ipython/pytest/sentry's job to figure out how to cope with them).

But before we can do that we need to fix the design, and do it in writing so we (and Yury) can all see that the new design is right :-).

Current design

[If this were a PEP I'd talk more about the basic assumptions underlying the design: multiple errors cna happen concurrently, you need to preserve that fact, you need to be able to catch some-but-not-all of those errors, you need to make sure that don't accidentally throw away any errors that you didn't explicitly try to catch, etc. But this is a working doc so I'm just going to dive into the details...]

Currently, trio thinks of MultiError objects as being ephemeral things. It tries as much as possible to simulate a system where multiple exceptions just happen to propagate next to each other. So it's important to keep track of the individual errors and their tracebacks, but the MultiError objects themselves are just a detail needed to accomplish this.

So, we only create MultiErrors when there are actually multiple errors – if a MultiError has only a single exception, we "collapse" it, so MultiError([single_exc]) is single_exc. The basic primitive for working with a MultiError is the filter function, which is really a kind of weird flat-map-ish kind of thing: it runs a function over each of the "real" exceptions inside a MultiError, and can replace or remove any of them. If this results in any MultiError object that has zero or one child, then filter collapses it. And the catch helper, MultiError.catch, is a thin wrapper for filter: it catches an exception, then runs a filter over it, and then reraises whatever is left (if anything).

One more important detail: traceback handling. When you have a nested collection of MultiErrors, e.g. MultiError([RuntimeError(), MultiError([KeyError(), ValueError()])]), then the leaf exceptions' __traceback__ attr holds the traceback for the frames where they traveled independently before meeting up to become a MultiError, and then each MultiError object's __traceback__ attr holds the frames that that particular MultiError traversed. This is just how Python's __traceback__ handling works; there's no way to avoid it. But that's OK, it's actually quite convenient – when we display a traceback, we don't want to say "exception 1 went through frames A, B, C, D, and independently, exception 2 went through frames A', B, C, D" – it's more meaningful, and less cluttered, to say "exception 1 went through frame A, and exception 2 went through frame A', and then they met up and together they went through frames B, C, D". The way __traceback__ data ends up distributed over the MultiError structure makes this structure really easy to extract.

Proposal for new design

Never collapse MultiErrors. Example:

async def some_func():
    async with trio.open_nursery() as nursery:
        async with trio.open_nursery() as nursery:
            raise RuntimeError

If you do await some_func() then currently you get a RuntimeError; in this proposal, you'll instead get a MultiError([MultiError([RuntimeError()])]).

Get rid of filter, and replace it with a new primitive split. Given an exception and a predicate, split splits the exception into one half representing all the parts of the exception that match the predicate, and another half representing all the parts that don't match. Example:

match, rest = MultiError.split(
  # simple predicate: isinstance(exc, RuntimeError)
  RuntimeError,
  # The exception being split:
  MultiError([
    RuntimeError("a"),
    MultiError([
      RuntimeError("b"),
      ValueError(),
    ]),
  ])
)

# produces:

match = MultiError([RuntimeError("a"), MultiError([RuntimeError("b")])])
rest = MultiError([MultiError([ValueError()])])

The split operation always takes an exception type (or tuple of types) to match, just like an except clause. It should also take an optional arbitrary function predicate, like match=lambda exc: ....

If either match or rest is empty, it gets set to None. It's a classmethod rather than a regular method so that you can still use it in cases where you have an exception but don't know whether it's a MultiError or a regular exception, without having to check.

Catching MultiErrors is still done with a context manager, like with MultiError.catch(RuntimeError, handler). But now, catch takes a predicate + a handler (as opposed to filter, which combines these into one thing), uses the predicate to split any caught error, and then if there is a match it calls the handler exactly once, passing it the matched object.

Also, support async with MultiError.acatch(...) so you can write async handlers.

Limitations of the current design, and how this fixes them

Collapsing is not as helpful to users as you might think

A "nice" thing about collapsing out MultiErrors is that most of the time, when only one thing goes wrong, you get a nice regular exception and don't need to think about this MultiError stuff. I say "nice", but really this is... bad. When you write error handling code, you want to be prepared for everything that could happen, and this design makes it very easy to forget that MultiError is a possibility, and hard to figure out where MultiError handling is actually required. If the language made handling MultiErrors more natural/ergonomic, this might not be as big an issue, but that's just not how Python works. So Yury is strongly against the collapsing design, and he has a point.

Basically, seeing MultiError([RuntimeError()]) tells you "ah, this time it was a single exception, but it could have been multiple exceptions, so I'd better be prepared to handle that".

This also has the nice effect that it becomes much easier to teach people about MultiError, because it shows up front-and-center the first time you have an error inside a nursery.

One of my original motivations for collapsing was that trio.run has a hidden nursery (the "system nursery") that the main task runs inside, and if you do trio.run(main) and main raises RuntimeError, I wanted trio.run to raise RuntimeError as well, not MultiError([RuntimeError()]). But actually this is OK, because the way things have worked out, we never raise errors through the system nursery anyway: either we re-raise whatever main raised, or we raise TrioInternalError. So my concern was unfounded.

Collapsing makes traceback handling more complicated and fragile

Collapsing also makes the traceback handling code substantially more complicated. When filter simplifies a MultiError tree by removing intermediate nodes, it has to preserve the traceback data those nodes held, which it does by patching it into the remaining exceptions. (In our example above, if exception 2 gets caught, then we patch exception 1's __traceback__ so that it shows frames A, B, C, D after all.) This all works, but it makes the implementation much more complex. If we don't collapse, then we can throw away all the traceback patching code: the tracebacks can just continue to live on whichever object they started out on.

Collapsing also means that filter is a destructive operation: it has to mutate the underlying exception objects' __traceback__ attributes in place, so you can't like, speculatively run a filter and then change your mind and go back to using the original MultiError. That object still exists but after the filter operation it's now in an inconsistent state. Fine if you're careful, but it'd be nicer if users didn't have to be careful. If we don't collapse, then this isn't an issue: split doesn't have to mutate its input (and neither would filter, if we were still using filter).

Collapsing loses `context` for intermediate `MultiError` nodes

Currently, Trio basically ignores the __context__ and __cause__ attributes on MultiError objects. They don't get assigned any semantics, they get freely discarded when collapsing, and they often end up containing garbage data. (In particular, if you catch a MultiError, handle part of it, and re-raise the remainder... the interpreter doesn't know that this is semantically a "re-raise", and insists on sticking the old MultiError object onto the new one's __context__. We have countermeasures, but it's all super annoying and messy.)

It turns out though that we do actually have a good use for __context__ on MultiErrors. It's super not obvious, but consider this scenario: you have two tasks, A and B, executing concurrently in the same nursery. They both crash. But! Task A's exception propagates faster, and reaches the nursery first. So the nursery sees this, and cancels task B. Meanwhile, the task B has blocked somewhere – maybe it's trying to send a graceful shutdown message from a finally: block or something. The cancellation interrupts this, so now task B has a Cancelled exception propagating, and that exception's __context__ is set to the original exception in task B. Eventually, the Cancelled exception reaches the nursery, which catches it. What happens to task B's original exception?

Currently, in Trio, it gets discarded. But it turns out that this is not so great – in particular, it's very possible that task A and B were working together on something, task B hit an error, and then task A crashed because task B suddenly stopped talking to it. So here task B's exception is the actual root cause, and task A's exception is detritus. At least two people have hit this in Trio (see #416 and https://github.com/python-trio/pytest-trio/issues/30).

In the new design, we should declare that a MultiError object's __context__ held any exceptions that were preempted by the creation of that MultiError, i.e., by the nursery getting cancelled. We'd basically just look at the Cancelled objects, and move their __context__ attributes onto the MultiError that the nursery was raising. But this only works if we avoid collapsing.

It would be nice if tracebacks could show where exceptions jumped across task boundaries

This has been on our todo list forever. It'd be nice if we could like... annotate tracebacks somehow?

If we stopped collapsing MultiErrors, then there's a natural place to put this information: each MultiError corresponds to a jump across task boundaries, so we can put it in the exception string or something. (Oh yeah, maybe we should switch MultiErrors to having associated message strings? Currently they don't have that.)

Filtering is just an annoying abstraction to use

If I wanted to express exception catching using a weird flat-map-ish thing, I'd be writing haskell. In Python it's awkward and unidiomatic. But with filter, it's necessary, because you could have any number of exceptions you need to call it on.

With split, there's always exactly 2 outputs, so you can perform basic MultiError manipulations in straight-line code without callbacks.

Tracebacks make filtering even more annoying to use than it would otherwise be

When filter maps a function over a MultiError tree, the exceptions passed in are not really complete, standalone exceptions: they only have partial tracebacks attached. So you have to handle them carefully. You can't raise or catch them – if you did, the interpreter would start inserting new tracebacks and make a mess of things.

You might think it was natural to write a filter function using a generator, like how @contextmanager works:

def catch_and_log_handler():
    try:
        yield
    except Exception as exc:
        log.exception(exc)

def normalize_errors_handler():
    try:
        yield
    except OtherLibraryError as exc:
        raise MyLibraryError(...) from exc

But this can't work, because the tracebacks would get all corrupted. Instead, handlers take exceptions are arguments, and return either that exception object, or a new exception object (like MyLibraryError).

If a handler function does raise an exception (e.g., b/c of a typo), then there's no good way to cope with that. Currently Trio's MultiError code doesn't even try to handle this.

In the proposed design, all of these issues go away. The exceptions returned by split are always complete and self-contained. Probably for MultiError.catch we still will pass in the exception as an argument instead of using a generator and .throwing it – the only advantage of the .throw is that it lets you use an except block to say which exceptions you want to catch, and with the new MultiError.catch we've already done that before we call the handler. But we can totally allow raise as a way to replace the exception, or handle accidental exceptions. (See the code below for details.)

Async catching

Currently we don't have an async version of filter or catch (i.e., one where the user-specified handler can be async). Partly this is because when I was first implementing this I hit an interpreter bug that made it not work, but it's also because filter's is extremely complicated and maintaining two copies makes it that much worse.

With the new design, there's no need for async split, I think the new catch logic makes supporting both sync and async easy (see below).

Details

# A classic "catch-all" handler, very common in servers
with MultiError.catch(Exception, logger.exception):
    ...

# is equivalent to:

except BaseException as exc:
    caught, rest = MultiError.split(Exception, exc)
    if caught is None:
        raise
    try:
        logger.exception(caught)
    except BaseException as exc:
        # The way we set exc.__context__ here isn't quite right...
        # Ideally we should stash it the interpreter's implicit exception
        # handling context state before calling the handler.
        if rest is None:
            try:
                raise
            finally:
                exc.__context__ = caught
        else:
            exc.__context__ = caught
            new_exc = MultiError([exc, rest])
            try:
                raise new_exc
            finally:
                new_exc.__context__ = None
                # IIRC this traceback fixup doesn't actually work b/c
                # of interpreter details, but maybe we can fix that in 3.8
                new_exc.__traceback__ = new_exc.__traceback__.tb_next
    else:
        if rest is not None:
            orig_context = rest.__context__
            try:
                raise rest
            finally:
                rest.__context__ = orig_context
                # See above re: traceback fixup
                rest.__traceback__ = rest.__traceback__.tb_next

Notes:

As noted in comments, __context__ and __traceback__ handling is super finicky and has subtle bugs. Interpreter help would be very... helpful.

Notice that all the logic around the logger.exception call is always synchronous and can be factored into a context manager, so we can do something like:

def catch(type, handler, match=None):
    with _catch_impl(type, match=None) as caught:
        handler(caught)
        
async def acatch(type, handler, match=None):
    with _catch_impl(type, match=None) as caught:
        await handler(caught)

Other notes

Subclassing

Do we want to support subclassing of MultiError, like class NurseryError(MultiError)? Should we encourage it?

If so, we need to think about handling subclasses when cloning MultiErrors in .split.

I think we should not support this though. Because, we don't have, and don't want, a way to distinguish between a MultiError([MultiError([...])]) and a MultiError([NurseryError([...])]) – we preserve the structure, it contains metadata, but still it's structural. split and catch still only allow you address the leaf nodes. And that's important, because if we made it easy to match on structure, then people would do things like try to catch a MultiError([MultiError([RuntimeError()])]), when what they should be doing is trying to catch one-or-more-RuntimeErrors. The point of keeping the MultiErrors around instead of collapsing is to push you to handle this case, not continue to hard-code assumptions about there being only a single error.

Naming

MultiError isn't bad, but might as well consider other options while we're redoing everything. AggregateError? CombinedError? NurseryError?

Relevant open issues

#408, #285, #204, #56, https://github.com/python-trio/pytest-trio/issues/30

Aug 18 '18 12:08 njsmith

Thanks for writing this up! I'm indeed thinking about how we can implement MultiErrors in asyncio, and the more I think about this the more I'm leaning towards not adding a full-blown MultiError semi-standard API at all. What I think would be enough instead is to have a new standard attribute on Exception objects to represent a list of exception this exception instance wraps. Let's assume we name that new attribute __errors__. If Python error displayhook sees an exception with __errors__ it knows that it will have to render all of them. This is it. It's now up to libraries (Trio, pytest, asyncio, etc) how exactly they want to design their MultiErrors; it's up to tools (like Sentry) how exactly they want to render multi-error-like exception instances. I think this would be a relatively straightforward change for Python 3.8 that would be easy to push through python-ideas/python-dev. Pushing a full MultiError API, on the other hand, will be a long and excruciating experience.

Aug 18 '18 21:08 1st1

@1st1 Hmm, yeah, our strategy there is an important question. I'm not sure I agree that pushing __errors__ through will be the easiest approach – that's changing one of the language's core protocols, and in a sort of vague way, while by contrast we can spin MultiError as adding a single concrete class that's needed by a concrete stdlib user. And even better if we can spin it as "here's an existing library, it already works, no need to bikeshed the details, we're just moving it into the stdlib so asyncio can use it". (There's plenty of precedent for this, e.g. ipaddress.)

Anyway, there'll be plenty of time later to decide on our python-dev strategy. For right now... you're going to need a "full-blown MultiError API" in any case, right, in some namespace or another? And IIRC you were hoping to prototype its usage in uvloop, right, so its initial implementation will be outside the stdlib? I think it'd be great to collaborate on this so it works for both asyncio and trio, and for right now it seems like the way to do that is to make a multierror library on pypi and use it for prototyping, even if that might not be the form we eventually use to present it to python-dev. What do you think?

Aug 19 '18 02:08 njsmith

(I love the new profile picture by the way, that's a great photo)

Aug 19 '18 02:08 njsmith

Played around with this a bit tonight and re:

It'd be nice if we could like... annotate tracebacks somehow? If we stopped collapsing MultiErrors, then there's a natural place to put this information: each MultiError corresponds to a jump across task boundaries, so we can put it in the exception string or something. (Oh yeah, maybe we should switch MultiErrors to having associated message strings? Currently they don't have that.)

...it immediately became clear that what you actually want is not the ability to annotate a MultiError, but rather to annotate each of the exceptions inside the MultiError. Like, "this came from child task X", "this came from the main task".

Aug 20 '18 09:08 njsmith

Looking at @belm0's example code here, I realized that this proposal is also going to complicate code that "hides" a nursery inside a custom context manager: https://github.com/HyperionGray/trio-websocket/issues/20#issuecomment-421504948

This code looks innocuous:

async with open_websocket("https://...") as ws:
    raise ValueError

but if open_websocket has a nursery hidden inside it, then you'll get a MultiError([ValueError]) instead of a ValueError, and that's going to surprise people, because it's mostly an implementation detail that open_websocket creates a nursery.

We can't hide this entirely, because of #264 – if open_websocket has a nursery inside it, then you can't yield inside an open_websocket (see). And to some extent, this is unavoidable, because it is an inescapable fact of life that if open_websocket has a background task, that task might crash, and then we need to cope with propagating that error out.

But, for the open_websocket case, there is another pattern possible: we could say that well, we've carefully designed our websocket library so that the background task should never crash – if it encounters an error, then it catches it and arranges for the error to be delivered the next time someone tries to send or receive. Of course, it's still possible for the background task to crash due to a bug in the websocket library... but that's a bug, so it can be handled a bit differently than a regular error. In this case, it might make sense to say:

An unhandled exception inside the body of the async with block propagates through without being wrapped in a MultiError
If there's an unhandled exception in a background task (which should never happen, and indicates a bug in our library), we signal that by converting it into a MyWebsocketLibraryInternalError (which might be-a or have-a MultiError)

In fact, this is exactly the pattern used by Trio's internal "system nursery" (with the main task playing the role of the async with block body, and TrioInternalError playing the part of MyWebsocketLibraryInternalError).

Maybe it should be something we support explicitly, e.g. with a custom nursery type that implements the above logic, or some sort of setting passed to open_nursery.

Sep 15 '18 01:09 njsmith

ExceptionGroup([SystemExit()]) needs special handling in the interpreter (the same special handling that SystemExit gets, which is something like: don't print a traceback, and if the payload is an integer then use that as the process exit code, otherwise print it?).

This is a nice example too because it forces us to think about how we can access attributes of embedded exceptions.

Sep 30 '18 21:09 njsmith

Subprocesses might be too big of a first bite for me, but I'd like to take a stab at this (much easier since it should be doable in pure Python). Here is my work in progress. I would like to add more tests of the contract before trying to integrate it into Trio. Do you have any ideas for Trio-agnostic tests? Any other comments? How best can I proceed to fix this issue?

Oct 02 '18 03:10 thejohnfreeman

@thejohnfreeman Sorry I didn't get back to you before! This is a high priority but I started writing up a bunch of notes from the Python core sprints a few weeks ago, stalled out, and then have been stuck on that..... let me finish those up real quick and push my totally incomplete prototype to https://github.com/python-trio/exceptiongroup , and then can compare notes and figure out how to move this forward :-)

Oct 08 '18 11:10 njsmith

Notes from the python core summit

@1st1 and I spent a bunch of time talking about this at the python core summit a few weeks ago; these are my notes from that.

The initial goal is to get enough consensus that we can start writing code and get some experience. So all of the conclusions below are tentative and subject to change, but it's looking like we have enough to get started: https://github.com/python-trio/exceptiongroup

Topics discussed

Should MultiError inherit from BaseException, or from Exception? (Or something else?)

Quick review: BaseException is the base class for all raise-able objects in Python. Exception is the base class for exceptions that should be caught by catch-all handlers, which is most but not quite all of them:

In [1]: BaseException.__subclasses__()                                          
Out[1]: [Exception, GeneratorExit, SystemExit, KeyboardInterrupt]

Also, trio.Cancelled is a BaseException, and asyncio is planning to make asyncio.CancelledError into a BaseException in the next release.

In Trio, MultiError inherits from BaseException. Yury is worried about this, because of backcompat: current asyncio code uses try: ... except Exception: ... as a generic catch-all handler. If asyncio.TaskGroup starts wrapping regular Exceptions into MultiError, and MultiError is a BaseException, then that means it will start "promoting" errors from regular Exceptions into BaseExceptions, and that breaks existing catch-all handlers. (E.g., aiohttp has 32 instances of except Exception – many of them don't have async code inside them, but they would all need auditing.) Trio doesn't really have this issue the same way, because there's less existing code and it's been using BaseException since the beginning.

However, if MultiError is an Exception, then it can only contain Exception subclasses, because otherwise a MultiError(KeyboardInterrupt) could get caught by an except Exception handler, which would be bad. So if you make MultiError into an exception, then you have to figure out what to do with KeyboardInterrupt and the other BaseException types.

At least with regards to catching exceptions, the general principle is: a MultiError object A can be an instance of exception type B (isinstance(A, B)) if and only if for every exception inside the MultiError, that exception is also an instance of B (all(isinstance(exc, B) for exc in A)). So for example, it would be OK for a MultiError([OSError, OSError]) to itself be an instance of OSError, and be caught by except OSError blocks. But MultiError([OSError, ValueError]) can't be an OSError or a ValueError, but only a Exception. And MultiError([KeyboardInterrupt, OSError]) can't be an OSError or even an Exception, but only a BaseException.

In principle we could dynamically set each MultiError object's base classes when it's created, based on this rule. But this doesn't seem very useful in practice... a MultiError([OSError, OSError]) could be caught by except OSError:, but it wouldn't have an .errno attribute. Also, the whole argument for why we don't want to "collapse" single-entry MultiErrors (MultiError([OSError]) → bare OSError) is that we want people to write code to handle what could happen, rather than what did happen. Letting them use except OSError to catch a MultiError([OSError, OSError]) would violate this: because next time it might be a MultiError([OSError, ValueError]), and now their except block doesn't work.

So trying to get super clever here doesn't seem to be a good idea. If we do anything here, it seems like it should be a special case target specifically at those except Exceptions. So then we would need a plan for how to handle BaseExceptions inside nurseries.

It seems like the two options are:

Make MultiError a BaseException, like Trio has done
Add a bunch of special-case code inside the nursery exception propagation logic to propagate BaseExceptions in a special way. E.g.: if there's a BaseException and a regular Exception, then the Exception gets discarded and the BaseException gets propagated, unwrapped. And if there are multiple BaseExceptions, then apply some hard-coded ad hoc rules, like KeyboardInterrupt beats GeneratorExit, which beats asyncio.CancelledError.

Yury is tentatively planning to take the second approach in asyncio. I don't want to do that in Trio; too much ad-hackery. So is there a way we can still share our toys?

Plan: make a BaseException version of MultiError that's built into Python. Then asyncio can do:

class TaskGroupError(MultiError, Exception):
    ...

to make a version of MultiError that's caught by except Exception and that can only contain Exception objects.

This may mean that trio-asyncio needs to do some extra translation of these errors when they pass from trio → asyncio, but since it already has to do that for cancellation exceptions then this shouldn't be a showstopper.

If we have multiple MultiError subclasses, there was some question of what kind of semantics to assign that. The concern is: any code for catching exceptions inside MultiError objects needs to know how to break a MultiError into its constituent pieces. Therefore, you really don't want to assign special meaning to the MultiError subclass itself – Trio's exception catching logic needs to be able to tear apart an asyncio MultiError without knowing what it is, and vice-versa. The consensus was that MultiError (and subclasses thereof) should be thought of as pure transparent containers – they carry information about how an exception or exceptions propagated, like the traceback; but, they shouldn't change how you interpret what happened, or catch/handle the exception. (This is also like tracebacks.)

In this context there was also some discussion of whether we wanted a generic protocol ("any exception object can contain sub-exceptions by defining an __exceptions__ attribute") versus a concrete type. But given these objects can't contain extra semantics, it doesn't makes sense to have e.g. an OSError that also has sub-exceptions inside it. And it turns out that even though trio and asyncio might use different types, they can both be handled by a single concrete type + subclasses. So having a single concrete class seems like the way to go.

What goes into Python?

Yury thinks we might want to be really minimal with what actually goes into Python: like maybe just the exception type and the traceback printers, but leave split and catch out for now. (Also, he really really dislikes the idea of using a with statement to catch these things, and really really wants to find some alternative. But it's very difficult to come up with something better; we spent some time trying and didn't really succeed.) So for now I'm going to go ahead and put split and with catch(...) in our prototype library so we can at least try them, and we can adjust later if we need to. But in deference to Yury, I'll make them top-level functions, rather than classmethods :-).

Name

I've been saying MultiError all this time, but that's actually not what we're going to use.

Guido disliked the Multi part, because they can (in the new design) hold a single exception.

After scrutinizing all the BaseException stuff, I'm uncomfortable with Error, since the Python convention is that that's used specifically for Exception subclasses (and not even all of them, just the ones that indicate real errors – see StopIteration).

We did some brainstorming during lunch; I suggested ExceptionGroup; Guido said "oo, now you're talking". So I guess that's what we're going with for now :-).

Oct 08 '18 11:10 njsmith

@thejohnfreeman OK, so I pushed my very-rough-draft code to the exceptiongroup repo so you can at least see it! I think we'll want to move over there and use that name. But, I don't know what code we'll want to use – I haven't really had a chance to look at yours yet. So I guess the next step is for someone to read through both sets of code and figure out which parts are most worth salvaging, and then moving forward in whatever direction makes sense? If you're still interested in working on this, then I guess that's what I'd suggest doing next? (I also want to work on it, but I'm juggling a lot of things, so if you want to take the lead on that it's fine with me :-).)

Oct 08 '18 11:10 njsmith

As further evidence that we need to stop "collapsing" single-item MultiError's, here's a real-world account of how this bit @belm0: https://github.com/python-trio/trio/issues/800

Dec 11 '18 22:12 njsmith

Cases of MultiError in my app tend to be Cancelled combined with something else, and so far I always want to defer to Cancelled. I'm using this pattern:

try:
    ...
except Foo:
    ...
except MultiError as e:
    raise MultiError.filter(lambda exc: None if isinstance(exc, Foo) else exc, e)

Dec 16 '18 11:12 belm0

I wonder if we can improve the ergonomics (i.e., put the exception handler below the code it handles exceptions in, like Python normally does) with something like:

try:
    # code
except:
    @ExceptionGroup.handle_each(FooException)
    def catch(exc):
        # log, raise another exception (which would chain), etc

Which would be implemented along the lines of

class ExceptionGroup:
    ...
    @staticmethod
    def handle_each(type, match=None):
        def decorate(catcher):
            with ExceptionGroup.catch(type, match=match):
                raise
        return decorate

Jan 10 '19 21:01 oremanj

Or possibly even

try:
    ...
except:
    @ExceptionGroup.handle(FooException)
    def handler(exc):
        ...

I'm not sure how to extend this to chain multiple blocks together though. Perhaps:

try:
    ...
except:
    handler_chain = exceptiongroup.HandlerChain()
    @handler_chain(FooException)
    def handler(exc):
        ...
    @handler_chain(BarException)
    def handler(exc):
        ...
    handler_chain.run()

Jan 10 '19 21:01 njsmith

We could let handle() take multiple types, and let users use whatever normal Python method of case-by-types they like:

try:
    ...
except:
    @ExceptionGroup.handle(FooException, BarException)
    def handler(exc):
        try:
            raise exc
        except FooException:
            ...
        except BarException:
            ...

I guess this adds a traceback frame that could be avoided with @handler_chain?

Jan 10 '19 22:01 oremanj

@oremanj That doesn't work with the v2 design, because exc here will still contain an ExceptionGroup, just one that's been split to only contain exceptions of the specified type.

Jan 10 '19 22:01 njsmith

Oh! I had assumed that ExceptionGroup.catch() would invoke the handler once for each leaf exception that matches the filter. I would find the other behavior pretty surprising, and I don't see how I could use it easily in a situation where I cared about the attributes of the exceptions.

I suspect we might want interfaces for both "handle everything in the ExceptionGroup that's a subtype of X" (for logging errors) and "handle each atomic exception that's a subtype of X separately" (for more specific handling). Maybe there's something I'm overlooking though...

Jan 10 '19 23:01 oremanj

Yeah, attribute access is something of an open issue. The problem is that individual leaf exceptions aren't really usable as standalone objects, because they only have part of the traceback. So maybe we can have an iteration API for accessing attributes or something? But we definitely never want to do raise leaf_exc. (There are more details in the first post, if you search for "filter".)

Jan 10 '19 23:01 njsmith

That makes sense, but it seems resolvable to me? Tracebacks are basically immutable once created; raising an exception adds new frames and modifies __traceback__ to point to the new head of the traceback, but the previous value of __traceback__ refers to the same traceback as it did before the raise. Seems like the ExceptionGroup interface that calls the user's handler could save the traceback beforehand and restore it after if the handler raises the same exception that was passed in.

Jan 11 '19 00:01 oremanj

@oremanj you can save/restore, sure, but what do you do with the added frame? normally python adds a frame to show you that the exception was caught and re-raised... and what if a different exception is raised, so now you want to set the original exception as the __context__ for the new exception?

Jan 11 '19 02:01 njsmith

Split the discussion of decorator-based catch APIs off into https://github.com/python-trio/exceptiongroup/issues/5

Jan 21 '19 01:01 njsmith

@njsmith asked me to write up my exception handling use case. (I was intending to post something on the Trio forum anyway.)

What I learned the hard way was to avoid having my async API's communicate by exception. It's attractive and a nice way to break down and compose certain problems, but dealing with the resulting MultiError's is error-prone and confusing (in current Trio), and I can't expect the casual programmers using my API to deal with it.

So I do have one API employing exceptions internally, but wrap that in a non-throwing API for my users. Let's say it's a robot control library with internal async primitives like advance(), retreat(), turn_in_place(), etc., which may raise exceptions when various sensors detect an obstruction, motor overload, etc. I've been careful to structure things so that only one exception can be raised at a time (i.e. raise is done only within the task calling the primitive, not by child tasks). Then I compose these primitives to make an non-throwing function like wander() which advances until there is an obstacle, turns in place to find a new heading, etc.

So I'm using exceptions but keeping things simple by having only one exception in flight, thereby avoiding MultiError. I can freely using native try/except, everything is great-- until I began leaning on move_on_after and other cancel scopes. Cancelled can happen at the same time as one of my API's exceptions by chance, and MultiError sad story (#800) ensued.

I think https://github.com/python-trio/trio/issues/611#issuecomment-447634850 was overlooked, but for my (perhaps quite limited) use case, given a MultiError of one of my API exceptions and Cancelled, it suffices to always defer to the latter. I suppose it's because my API's exceptions are a form of informational method exit, so if the parent scope was cancelled it's fine to drop them. So I use MultiError filtering to achieve this, but find it quite cumbersome. But at least it lets me continue using plain try/catch for my API's exceptions.

And as I wrote today https://github.com/python-trio/trio/issues/408#issuecomment-496338103, I haven't found MultiError.catch() useful because it doesn't let me look at the exception list before deciding what to do. (E.g. only filter exceptions when Cancelled is present.)

Actually today I've just distilled the concept into a defer_to_cancelled() context manager:

@contextmanager
def defer_to_cancelled(*args: Type[Exception]):
    """Context manager which defers MultiError exceptions to Cancelled.

    In the scope of this context manager, any raised trio.MultiError exception
    which is a combination of the given exception_types and trio.Cancelled
    will have the exception_types filtered, leaving only a Cancelled exception.

    The intended use is where routine exceptions (e.g. which are part of an API)
    might occur simultaneously with Cancelled (e.g. when using move_on_after()).
    Without properly catching and filtering the resulting MultiError, an
    unhandled exception will occur.  Often what is desired in this case is
    for the Cancelled exception alone to propagate to the cancel scope.

    :param args:  One or more exception types which will defer to
        trio.Cancelled.  By default, all exception types will be filtered.

    Example:

        # If MultiError([Cancelled, Obstacle]) occurs, propagate only Cancelled
        # to the parent cancel scope.
        with defer_to_cancelled(Obstacle):
            try:
                # async call which may raise exception as part of API
                await advance(speed)
            except Obstacle:
                # handle API exception (unless Cancelled raised simultaneously)
                ...

    TODO: Support consolidation of simultaneous user API exceptions (i.e.
        MultiError without Cancelled).  This would work by prioritized list
        of exceptions to defer to.  E.g. given:
            [Cancelled, WheelObstruction, RangeObstruction]
        then:
            Cancelled + RangeObstruction => Cancelled
            WheelObstruction + RangeObstruction => WheelObstruction
    """
    try:
        yield
    except trio.MultiError as e:
        exceptions = e.exceptions  # pylint: disable=no-member
        if not any(isinstance(exc, trio.Cancelled) for exc in exceptions):
            raise
        if not args:
            raise trio.MultiError.filter(  # pylint: disable=raising-bad-type
                lambda exc: exc if isinstance(exc, trio.Cancelled) else None,
                e)
        raise trio.MultiError.filter(  # pylint: disable=raising-bad-type
            lambda exc: None if isinstance(exc, args) else exc,
            e)

May 28 '19 06:05 belm0

Since we've taken a lot of inspiration from trio, I wanted to throw (back?) in a few cents for the error handling ergonomics...

As try: raise b; except E: ... is based on issubclass(type(b), E), it is possible to have custom exception matching by implementing type(E).__subclasscheck__. The TLDR is that one can do something like this:

try:
    async with something:
        something.raise_concurrent()
except MultiError[A]:
     print("Handled concurrent A")
except MultiError[A, B] as err:
     print("Handled concurrent A and B")
except MultiError[B], MultiError[C] as err:
     print("Handled concurrent B or C:", err)
except A:
    print("Handled non-concurrent A")

This gives you basically the entire native Python exception handling experience, but for multiple errors as well.

We're currently experimenting with the exact semantics in our own async simulation framework, but the approach is solid. It requires a MetaClass, but the upside is that one can use native Python exception handling. The approach allows for covariance, e.g. issubclass(MultiError[KeyError], MultiError[LookupError]), and exclusive (MultiError[IndexError, KeyError]) or inclusive (MultiError[IndexError, KeyError, ...]) matching. Due to covariance we haven't found a way to consistently match a fixed number of each type, i.e. MultiError[KeyError] is MultiError[KeyError, KeyError] in our prototype.

Of course that allows to implement most rules one see fit. Three rules seem to work well for usability in our case, but are subjective:

Separate async and sync errors. The example of async with open_websocket("https://..."): raise ValueError would always propagate a regular ValueError. This allows "nurseries" to remain an implementation detail - which should be handled by the implementor, not the user.
Never coerce/match sync and async errors. This allows to have a reliable interface when handling except Exception as err: or except MultiError[Exception] as err:. I.e. filter/split/... for MultiError[E] and the bare exception for just E. That also means our MultiError must be a BaseException.
Privileged exceptions are propagated unwrapped. SystemExit, KeyboardInterrupt, and AssertionError mean that we are tearing down, and there is no point recovering. If our "nursery" encounters these as either sync or async errors, they replace every other exception in flight. GeneratorExit and our own internal Interrupt are never propagated by our "nursery" to another Task. For their own task, GeneratorExit and unhandled Interrupt takes precedence over everything.

That basically means users do not have to worry about manually propagating cancel exceptions and the like. One can also add/remove nurseries without changing the how exceptions propagate though the main path.

Our own "nursery" equivalent is modelled closely after trio, so similar rules may be worth considering for trio as well. Since we are dealing with a simulation only, I am not entirely sure whether trio can enforce the same guarantees on time, though.

Sep 03 '19 11:09 maxfischer2781

In principle we could dynamically set each MultiError object's base classes when it's created, based on this rule. But this doesn't seem very useful in practice... a MultiError([OSError, OSError]) could be caught by except OSError:, but it wouldn't have an .errno attribute.

Not to mention that it is actually two different exceptions, and therefore could have two different errnos.

I don't think that in general, code which was written to specifically handle "an exception" can be silently promoted to handling "a collection of exceptions" - even if they are of the same type. Handling a specific OSError only makes sense within the same thread of execution anyway. For example, say you have an exception handler which catches FileNotFoundError - it only makes sense to do this if you know which file you just tried to access and what you were trying to do. (os.unlink and os.open can both raise FileNotFoundError)

Hence having MultiError as a wrapper for "error(s) which occurred in other thread(s) of execution" is helpful to distinguish them anyway.

However, having Cancelled taking precedence within a group of exceptions also makes sense to me. Could this be done simply by Cancelled being a different kind of MultiError? For example, say it contains a list of errors as an attribute, and Cancelled is similar but different type of MultiError?

try:
    ...
except Cancelled:  # MultiError where at least one is Cancelled
    ...
    raise
except MultiError: # other kind of MultiError
    ...

Taking this a little further, you could make the multi-error exception be:

Cancelled (derived from BaseException), if any of them are Cancelled
else BaseMultiError (derived from BaseException), if any of them is outside of Exception (e.g. KeyboardInterrupt)
else MultiError (derived from Exception)

Is that too much hard-coding of hierarchy? Maybe it's not necessary if KeyboardInterrupt were converted into Cancelled at the first opportunity.

Which makes me think: what if all BaseExceptions were wrapped in Cancelled?? Cancelled is already derived from BaseException, so in that case MultiError could be derived from Exception.

Sorry, just random ideas :-)

Oct 14 '19 10:10 candlerb

Hi all - what's the current state of the ExceptionGroup effort?

Hypothesis is currently considering how to generalise minimal examples, and one of the major challenges that our status quo for reporting multiple errors is not as helpful as we'd like it to be (for example, it doesn't support pdb).

So if there is or soon will be a widely shared way of doing this which we could support, that would probably improve the lives of our users as well as developers! I'd be happy to help out too, if there's some way to do so 😄

Nov 18 '19 00:11 Zac-HD

I thought I understood MultiError, but recently I was surprised by this mistake I made in a WebSocket server. Each connection has a heartbeat task (sends a WebSocket ping frame every n seconds and waits for a pong) and a request handler task. If the client side closes the connection, most of the time the heartbeat will be waiting in a await trio.sleep(n). In that scenario, the request handler raises ConnectionClosed and the heartbeat task raises Cancelled. My exception handler used a filter to remove Cancelled so that I end up with just a bare exception that I can catch:

try:
    with trio.MultiError.catch(remove_cancelled):
        await connection.run()
except ConnectionClosed:
    logger.info('Connection closed %s@%s', user, client_ip)
except Exception:
    logger.exception('Connection exception %s@%s', user,
        client_ip)

There are two errors here:

MultiError inherits from BaseException, so my fallback handler won't catch it, and it crashes the server instead. This is just a stupid error on my part.
The more interesting error is the assumption that a closed connection always returns a single ConnectionClosed. If a client disconnects while a ping frame is in flight, then both of the connection's task will raise ConnectionClosed, which means my filter still returns a MultiError. This rare race condition was occasionally crashing my server.

I ended up writing a new filter that removes cancellations and all but one ConnectionClosed, and then added MultiError to the last except. I don't know if Trio can do anything to make this easier to reason about. I recognize that part of the problem is the inherent complexity of writing concurrent code and my own lack of experience here. But I thought it was an interesting real-world scenario that might be relevant to the design of MultiError. The v2 design will definitely help because it won't collapse MultiError, but I don't know if there's an ergonomic way to express, "if all of the exceptions are Cancelled or ConnectionClosed then do this, else do that". This example deals with trio_websocket but in principle I think you'd have the same phenomenon when using bare Trio streams.

Feb 28 '20 15:02 mehaase

@mehaase I think what you want is a decorator like @defer_to_exception(ConnectionClosed), where a MultiError with any combination of ConnectionClosed and Cancelled will yield ConnectionClosed. It could be generalized to take a list of exceptions in priority order, where any combination of those exceptions and Cancelled will yield the highest priority exception.

It's along the lines of @defer_to_cancelled in trio-util, which does the opposite (defers low-priority exceptions to Cancelled).

https://trio-util.readthedocs.io/en/latest/#trio_util.defer_to_cancelled

In fact I think it could be generalized across the two cases by including Cancelled in the exception list: @defer_to_exception(ConnectionClosed, Cancelled, MyLowPriorityException), where the arg list is from higher to lower priority.

Feb 28 '20 21:02 belm0

I tried to make an API and implementation for a general multi_error_defer_to() context manager mentioned previously, which accepts an ordered list of privileged exception types and coalesces a MultiError into the highest priority exception as long as every exception in the hierarchy is an instance of the listed exceptions. There were several subtle considerations:

should it be strict about not choosing an arbitrary exception object among several different candidates (per repr())? - Merging Cancelled() and Cancelled() is fine, but how about ValueError('foo') and ValueError('bar'), or exceptions objects of two derived types matching the base type in the privileged list? I wouldn't want the non-determinism, but perhaps offering it as an option would suit someone else. (Unfortunately it's a bit costly to implement the strictness check.)
should it ensure that MultiError is never raised? - I can imagine use cases where you want to ensure MultiError is not leaked from your API, so it seems like there should be an option. If the MultiError cannot be coalesced, a RuntimeError is raised.

def multi_error_defer_to(*privileged_types: Type[BaseException],
                         propagate_multi_error=True,
                         strict=True):
    """
    Defer a trio.MultiError exception to a single, privileged exception

    In the scope of this context manager, a raised MultiError will be coalesced
    into a single exception with the highest privilege if the following
    criteria is met:
        1. every exception in the MultiError is an instance of one of the given
           privileged types
    additionally, by default with strict=True:
        2. there is a single candidate at the highest privilege after grouping
           the exceptions by repr().  For example, this test fails if both
           ValueError('foo') and ValueError('bar') are the most privileged.

    If the criteria are not met, by default the original MultiError is
    propagated.  Use propagate_multi_error=False to instead raise a
    RuntimeError in these cases.

    Synopsis:
        multi_error_defer_to(trio.Cancelled, MyException)
            MultiError([Cancelled(), MyException()]) -> Cancelled()
            MultiError([Cancelled(), MyException(), MultiError([Cancelled(), Cancelled())]]) -> Cancelled()
            MultiError([Cancelled(), MyException(), ValueError()]) -> *no change*
            MultiError([MyException('foo'), MyException('foo')]) -> MyException('foo')
            MultiError([MyException('foo'), MyException('bar')]) -> *no change*

        multi_error_defer_to(MyImportantException, trio.Cancelled, MyBaseException)
            # where isinstance(MyDerivedException, MyBaseException)
            #   and isinstance(MyImportantException, MyBaseException)
            MultiError([Cancelled(), MyDerivedException()]) -> Cancelled()
            MultiError([MyImportantException(), Cancelled()]) -> MyImportantException()
    """

Mar 15 '20 12:03 belm0

I've published multi_error_defer_to() to trio-util:

https://github.com/groove-x/trio-util/blob/76254159ee6710659245420b5ed2a3f446dbf83b/src/trio_util/_exceptions.py#L74-L154

Jun 29 '20 07:06 belm0

ExceptionGroup([SystemExit()]) needs special handling in the interpreter

ExceptionGroup([KeyboardInterrupt()]) needs special handling also https://github.com/python/cpython/pull/21956/files#diff-75445bdc3b6b3dd20b005698fa165444R290

Sep 24 '20 14:09 graingert

trio trio copied to clipboard

MultiError v2

Current design

Proposal for new design

Limitations of the current design, and how this fixes them

Collapsing is not as helpful to users as you might think

Collapsing makes traceback handling more complicated and fragile

Collapsing loses __context__ for intermediate MultiError nodes

It would be nice if tracebacks could show where exceptions jumped across task boundaries

Filtering is just an annoying abstraction to use

Tracebacks make filtering even more annoying to use than it would otherwise be

Async catching

Details

Other notes

Subclassing

Naming

Relevant open issues

Notes from the python core summit

Topics discussed

Should MultiError inherit from BaseException, or from Exception? (Or something else?)

What goes into Python?

Name

trio
trio copied to clipboard

Collapsing loses `context` for intermediate `MultiError` nodes