cattrs icon indicating copy to clipboard operation
cattrs copied to clipboard

How do I extract error information from the exception(s) raised when structuring?

Open pfmoore opened this issue 2 years ago • 5 comments

  • cattrs version: 22.1.0
  • Python version: 3.10.3
  • Operating System: Windows 11

Description

When I structure a raw dictionary into a data class, if there's one or more problems, I get an exception (which apparently can be an exception group, using the backport of a new Python 3.11 feature). The raw traceback of the exception is pretty user-unfriendly.

How do I get the details of what failed, in a form that I can use to report to the end user as an error report showing "user facing" terminology, not technical details?

What I Did

>>> cattrs.structure(["a", "b"], list[int])
  + Exception Group Traceback (most recent call last):
  |   File "<stdin>", line 1, in <module>
  |   File "C:\Work\Projects\wheeliebin\.venv\lib\site-packages\cattrs\converters.py", line 281, in structure
  |     return self._structure_func.dispatch(cl)(obj, cl)
  |   File "C:\Work\Projects\wheeliebin\.venv\lib\site-packages\cattrs\converters.py", line 470, in _structure_list
  |     raise IterableValidationError(
  | cattrs.errors.IterableValidationError: While structuring list[int] (2 sub-exceptions)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "C:\Work\Projects\wheeliebin\.venv\lib\site-packages\cattrs\converters.py", line 463, in _structure_list
    |     res.append(handler(e, elem_type))
    |   File "C:\Work\Projects\wheeliebin\.venv\lib\site-packages\cattrs\converters.py", line 380, in _structure_call
    |     return cl(obj)
    | ValueError: invalid literal for int() with base 10: 'a'
    | Structuring list[int] @ index 0
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    |   File "C:\Work\Projects\wheeliebin\.venv\lib\site-packages\cattrs\converters.py", line 463, in _structure_list
    |     res.append(handler(e, elem_type))
    |   File "C:\Work\Projects\wheeliebin\.venv\lib\site-packages\cattrs\converters.py", line 380, in _structure_call
    |     return cl(obj)
    | ValueError: invalid literal for int() with base 10: 'b'
    | Structuring list[int] @ index 1
    +------------------------------------

>>> try:
...     cattrs.structure(["a", "b"], list[int])
... except Exception as exc:
...     # Not sure what to do here...

What I'd like to be able to do is get the two ValueError exceptions, as well as where they happened (list[int] @ index 1). If I simply print exc, it returns While structuring list[int] (2 sub-exceptions), which is user-friendly, but omits the information needed to tell the user how to fix the issue.

With a bit of digging, it seems as if I can do something like

for x in exc.exceptions:
  print(f"{x}: {x.__note__}")

But it's possible to get nested exception groups, so this is incomplete. And __note__ seems to be a string which I can't introspect to produce output in a different format. The only way I can see of having full control over how any issues are reported, is to pre-validate the data structure, at which point I might as well write my own structuring code at the same time...

Am I missing something here, or am I trying to use the library in a way that wasn't intended? (My use case is extremely close to user input validation, it's just that the user input is in TOML, which I can parse easily with a library, but in doing so I defer all the validation of data structures, so my "unstructured data" is actually error-prone user input in all but name...)

pfmoore avatar Apr 25 '22 21:04 pfmoore

That's a great question, and doing this is definitely one of the use cases for cattrs. The problem is I don't know what the output format would be, so I left it kind of unfinished for the first version. I guess the idea was I'd leave the parsing of the ExceptionGroup to users until a consensus on how this is supposed to work materializes.

I remember @hynek mentioned liking how Voluptuous reports errors, but I haven't had the time to really dig into it.

One other thing to mention is that we actually raise a couple of subclasses of ExceptionGroups (you're looking at IterableValidationError there in your traceback), so that information can help in parsing too.

In any case, this is a great discussion to have for the next version of cattrs.

Tinche avatar Apr 25 '22 23:04 Tinche

Thanks. My feeling is that exception groups are a reasonable idea for how to handle the problem of having multiple potential problems in a single conversion. The issues I have with the current approach (which may well be issues with the exception group API itself rather than with cattrs) are:

  1. Sometimes I get an exception group, and sometimes I get a "plain" exception. I can't reliably reproduce this (it may involve exceptions from structuring hooks). But this makes handling more complex than it should be.
  2. The default str() of an exception group loses too much information (the "2 sub-exceptions" summary), but the only way to get individual details of where exceptions occurred is by parsing the __note__ string (which is horrid, and presumably unsupported as I doubt you want to commit to that string never changing). I seem to recall a discussion on Discourse when __note__ was added. I understand why people wanted structured data now :-(
  3. None of this is documented. A section in the docs covering "Producing user-friendly errors" would be awesome.

I'll have a think about how I can give better examples of what I'm after. I might be able to trim down my use case into a small example project and link to that - showing some structuring code, with an example of the sort of error messages I'd like to see.

pfmoore avatar Apr 26 '22 08:04 pfmoore

Here's a small example. It doesn't cover all of the questions I have (for a start, it somehow doesn't raise an exception group in the case where there are 2 errors!) but it gives an idea of what I'm hoping for.

from packaging.version import Version
import attrs
import cattrs

def parse_value(value, typ):
    if isinstance(value, typ):
        return value
    return typ(value)

@attrs.define
class Example:
    name: str
    version: Version
    n: int | None = 3

conv = cattrs.Converter()
conv.register_structure_hook(Version, parse_value)
conv.register_structure_hook(int, parse_value)

def check(d):
    try:
        s = conv.structure(d, Example)
        print(s)
    except Exception as exc:
        print(str(exc))
        print("Exception type:", type(exc))

valid = {"name": "foo", "version": "1.0"}
check(valid)
print("Expected:\n  No error\n")

invalid1 = {"version": "1.0", "n": "12"}
check(invalid1)
print("Expected:\n  Missing mandatory field 'name'\n")

invalid2 = {"version": "wrong", "n": "xx"}
check(invalid2)
print("Expected:\n  Invalid field 'version': 'wrong'\n  Invalid field 'n': 'xx'\n")

invalid3 = {"n": 9}
check(invalid3)
print("Expected:\n  Missing mandatory fields 'name' and 'version'\n")

Output:

Example(name='foo', version=<Version('1.0')>, n=3)
Expected:
  No error

Example.__init__() missing 1 required positional argument: 'name'
Exception type: <class 'TypeError'>
Expected:
  Missing mandatory field 'name'

Invalid version: 'wrong'
Exception type: <class 'packaging.version.InvalidVersion'>
Expected:
  Invalid field 'version': 'wrong'
  Invalid field 'n': 'xx'

Example.__init__() missing 2 required positional arguments: 'name' and 'version'
Exception type: <class 'TypeError'>
Expected:
  Missing mandatory fields 'name' and 'version'

The two cases of TypeError are almost what I want, if only I could change the exception type (and hence the message) while still retaining the list of names that are missing.

The case with 2 errors fails on two counts: first, it only seems to report one of the issues, and second, it doesn't tell me the field name ("Invalid version" in the error is the text of the exception, not the field name).

Maybe I'm expecting too much here (in this case, the output is almost what I want, although that's at least in part because for some reason it's not raised an exception group). It's possible that something like Voluptuous would be more suitable for me, pre-validating the structure. Thanks for that link, by the way, I'd never heard of that library.

pfmoore avatar Apr 26 '22 09:04 pfmoore

I'm going to be diving into this this week (partially because I need better functionality here too). Thanks for the example, these kinds of use cases are exactly what I need to flesh out this functionality.

Tinche avatar May 02 '22 13:05 Tinche

Hm it looks like the __note__ PEP got changed in the meantime (https://peps.python.org/pep-0678/), and it now has .add_note() and __notes__ instead of just __note__. I'll need to account for this in the next version of cattrs. The PEP also recommends against putting structured data in the notes, so if we want to parse the exception information we should probably use a dedicated attribute for this instead.

Tinche avatar May 03 '22 10:05 Tinche

The case with 2 errors fails on two counts: first, it only seems to report one of the issues, and second, it doesn't tell me the field name ("Invalid version" in the error is the text of the exception, not the field name).

This is that I would be very interested in as well. With attrs alone, I'm able to extract the name of the attribute from the exception.

from attrs import define, field, validators


@define
class Person:
    first_name: str = field(validator=validators.instance_of(str))
    last_name: str = field(validator=validators.instance_of(str))


try:
    pers = Person(first_name=1, last_name="")
except Exception as e:
    _, attr, _, _ = e.args
    print(attr.name)

klausmcm avatar Nov 24 '22 01:11 klausmcm

Hi! 👋 I was just looking for the same functionality (getting user-friendly error messages) and stumbled across this discussion. So let me quickly upvote this feature request 👍😄

I really like the modular approach of attrs/cattrs a lot and would love to switch from pydantic to this framework in some of my projects. However, one reason why I have not yet done so is because pydantic really has an edge when it comes to error messages. In fact, I've had quite a bit of trouble understanding the error messages from cattrs myself, even when only doing "backend-related" conversions, so I would imagine that things could become even more cryptic when trying to parse actual user input.

Are the any specific plans already for improving the output?

AdrianSosic avatar Mar 17 '23 13:03 AdrianSosic

I'm kind of thinking about this in the background.

Possible solutions:

  1. use a custom class instead of a string for the __notes__ note, with metadata and an overriden __str__ to appear as a bare error message
  2. use a string subclass, that again would have the correct metada
  3. provide regexps that can parse the string

PEP 678 mandates ExceptionGroup notes be strings, so solution 1 will probably run into issues with tooling somewhere down the line. Solution 3) will probably be a little slow (performance-wise), hard to use and won't allow us to serialize Python types (just strings).

So solution 2) is probably the way to go. I was inspired by the Lark parser library, which also uses string subclasses for some of its functionality.

Tinche avatar Mar 17 '23 14:03 Tinche

Assuming you'd like to avoid rewrapping individual validation errors, instead of trying to shoehorn structured data in __notes__ (and if catching and reraising the exception group to expand it is acceptable), would a method on the exception group subclass not be a better fit? Something along the lines of:

class BaseValidationError(ExceptionGroup):
    def get_structured_exceptions(self) -> tuple[Self | StructuredException, ...]:
        ...
    
@frozen
class StructuredException:
    loc: list[str | int]
    obj: Exception


[...]


try:
    cattrs.structure(foo, SomeType)
except BaseValidationError as exc_group:
    exc_group.get_structured_exceptions()

layday avatar Mar 17 '23 14:03 layday

Howdy, so here's status update.

I've decided to go with a string subclass for the __notes__, but I think I've managed to hide it behind a nicer facade so unless your need a large degree of customization, you don't need to know about it.

I've introduced a function, cattrs.transform_error. When a converter with detailed validation produces an error, you feed that error into this function and it'll produce a list of error messages (list[str] is the result type). To give you a feeling of what this looks like:

@define
class C:
    a: int
    b: int = 0

try:
    c.structure({}, C)
except Exception as exc:
    assert transform_error(exc) == ["required field missing @ $.a"]

and

try:
    c.structure(["str", 1, "str"], List[int])
except Exception as exc:
    assert transform_error(exc) == [
        "invalid value for type, expected int @ $[0]",
        "invalid value for type, expected int @ $[2]",
    ]

The error messages have a description and a path (that's the $.a thing).

transform_error can be parametrized by giving it another callable, format_exception. This callable takes an Exception (and an optional target type) and returns an error description. I've included a small implementation of format_exception in the cattrs.v module. The idea is you can wrap this function with your own to customize error descriptions, and give transform_error your own version.

Ok, so now for the internal changes. I've implemented two string subclasses, AttributeValidationNote and IterableValidationNote. They are simply strings with some added metadata (like the attribute name and type for AttributeValidationNote, and the index for IterableValidationNote), and they get attached like regular notes to exceptions while structuring.

IterableValidationError and ClassValidationError have two new helper methods, group_exceptions. These will examine their subexceptions and group them into two sets: exceptions with notes and without.

cattrs.transform_error uses this knowledge to parse the given exception into a list of error messages. It's just DAG processing (the exceptions form a non-cyclical tree). If you need greater customization, you can copy/paste the transform_error function and change what you need, or use it as an inspiration to write your own.

I need to flesh out the tests some more, and write docs. I plan on releasing this with the next version, which should come before PyCon hopefully.

Tinche avatar Mar 26 '23 16:03 Tinche

cool, transform_error looks simple enough to re-implement as long as it remains a leaf function, that should be all I need!

P.S. don't jinx your releases by saying you'll get it done before PyCon or at the PyCon sprints. 🤪

hynek avatar Mar 29 '23 07:03 hynek

Merged! Thanks everyone.

Let's open new issues for anything that comes up.

Tinche avatar Mar 31 '23 01:03 Tinche