cattrs icon indicating copy to clipboard operation
cattrs copied to clipboard

structure: how to report path to invalid data element

Open vlcinsky opened this issue 7 years ago • 8 comments

  • cattrs version: 0.9.0
  • Python version: 3.6.ř
  • Operating System: Debian 9

Description

I want to cattrs to load complex nested data and in case some validation/conversion fails, I want to provide reasonable context information about what part of data did not work properly.

What I Did

Having attrs based classes: Config with attributes source, fetch and publish, each holding value of specific (attrs based) class Source, Fetch and Publish.

If some data element is wrong (e.g. expecting integer and providing string "5a"), the structure process fails raising ValueError("could not convert string to float: '5a'",)

However, the error does not include any contextual information about where in my nested input the problem was read from.

It would be nice to get some sort of path in the exception, which I could use. marshmallow and trafaret are examples of similar solutions providing contextual information.

vlcinsky avatar Oct 24 '18 21:10 vlcinsky

Here is possibly quite crazy idea how to report an error incl. path within input data leading to the failure.

Requirements:

  • initially focus only on structure and assume input data in form of dict
  • allow reporting path to input data element causing raised conversion error
  • keep required changes to code (structure_hook functions) to bare minimum
  • tolerate structure_hook implementation not implementing new approach (possibly at cost of loosing part of path information)
  • run fast, try to avoid any extra operations during happy scenario

Concept:

  • path has form of list of __getitem__ arguments, e.g. ["oak", 1] data["oak"][1]
  • no need to cover non-iterable data types
  • focus on iterables. Store current position of iteration in local variable with agreed name cattrs_i. This is the only required change to structure_hook function implementation.
  • all path detection to be done within cattr.structure function
    • detection is done by traversing traceback stack, inspecting local variables and collecting all values of cattrs_i variables in resulting path list.
    • store the path in .path exception property and raise the catched exception

Here is code to demonstrate how to detect the path from an exception raised in deeply nested call. If you store the code into test_path_detection.py, it shall be executable using pytest (expecting python 3.6+).

def int_structure_hook(val, dtype):
    return int(val)


def list_structure_hook(lst, dtype):
    return [int_structure_hook(itm, int) for cattrs_i, itm in enumerate(lst)]


def dict_structure_hook(dct, dtype):
    return [list_structure_hook(val, list) for cattrs_i, val in dct.items()]


def structure(val):
    try:
        return dict_structure_hook(val, dict)
    except ValueError as exc:
        path = []
        tb = exc.__traceback__
        while tb:
            path_elm = tb.tb_frame.f_locals.get("cattrs_i")
            if path_elm:
                path.append(path_elm)
            tb = tb.tb_next
        exc.path = path
        raise exc


def test_it():
    try:
        res = structure({"oak": [1, "0aa", 3], "birch": [9, 2, 0]})
        print(f"Happy result is: {res}")
    except ValueError as exc:
        print(f"Path {exc.path}: has problem: {exc}")

When called: $ pytest test_path_detection.py -sv the printed output related to reported path is

Path ['oak', 1]: has problem: invalid literal for int() with base 10: '0aa'

What do you think of that? No perfect results, but something, what helps navigating close to source of problem in many cases. Definitely would require (small) modifications in existing converters.

vlcinsky avatar Oct 25 '18 10:10 vlcinsky

This is something I would definitely like to support, since getting an error somewhere deep can be very annoying indeed. Need to think about it.

Tinche avatar Oct 25 '18 13:10 Tinche

@Tinche take your time, it is not an easy problem.

Here is alternative method: pass path via explicit argument to conversion function:

"""alternative passing path context via argument `path`

Converters have singature: func(val, dtype, *path)
where `path` is the path to the current element (list of values)

When calling, one uses original `path` value with * and adds new selector to the end

fun(val, dtype, *path, index)

what results in extended `path` value within the deeper function.
"""


def int_structure_hook(val, dtype, *path):
    return int(val)


def list_structure_hook(lst, dtype, *path):
    return [int_structure_hook(itm, int, *path, i) for i, itm in enumerate(lst)]


def dict_structure_hook(dct, dtype, *path):
    return [list_structure_hook(val, list, *path, key) for key, val in dct.items()]


def structure(val, dtype):
    try:
        return dict_structure_hook(val, dict)
    except ValueError as exc:
        path = []
        tb = exc.__traceback__
        while tb:
            deeper_path = tb.tb_frame.f_locals.get("path")
            if deeper_path:
                path = deeper_path
            tb = tb.tb_next
        exc.path = path
        raise exc


def test_it():
    try:
        res = structure({"oak": [1, "0aa", 3], "birch": [9, 2, 0]}, dict)
        print(f"Happy result is: {res}")
    except ValueError as exc:
        print(f"Path {exc.path}: has problem: {exc}")
        assert exc.args[0] == "invalid literal for int() with base 10: '0aa'"
        assert isinstance(exc, ValueError)
        assert exc.path == ("oak", 1)

To avoid confusion with intermediate functions using path argument, traversing __traceback__ may check, that givel locals are within function which is registered at converter.

vlcinsky avatar Oct 25 '18 14:10 vlcinsky

Incidentally, one of the hardest things to debug is when you have a NoneType that can't be converted into whatever the expected type is. Without a path, currently there's no way to even guess at which of the many nulls in your input it's failing on.

petergaultney avatar Dec 13 '18 13:12 petergaultney

We could copy a few ideas from jsonschema, and potentially yield errors iteratively? Or maybe that's not really in scope, since cattrs need to return the new result.

But have a look at their ValidationError, there's a few fields there that we could potentially use.

madsmtm avatar Jan 24 '19 20:01 madsmtm

Hey, sorry for kinda necro-post, but have there been any progress on this? It would really be very handy to have this feature :)

Tmpod avatar Nov 16 '20 16:11 Tmpod

This is probably the next big feature I work on :)

Tinche avatar Nov 19 '20 01:11 Tinche

Nice to hear it! Sorry for the question, but do you have any ETA for it?

Tmpod avatar Nov 22 '20 15:11 Tmpod

So there's https://catt.rs/en/stable/validation.html#transforming-exceptions-into-error-messages in the last release, 23.1.x.

I'm going to close this as complete, let's open new tickets for any desired improvements!

Tinche avatar Jul 08 '23 01:07 Tinche