dataclass-wizard icon indicating copy to clipboard operation
dataclass-wizard copied to clipboard

Question/feature request: Map a (nested) JSON/dict Key to a Field

Open iwconfig opened this issue 3 years ago • 3 comments

It is possible to map JSON keys to fields with different names. However, I would like to map nested JSON/dict by key path, preferably by dot-delimited strings

'parent.child.grandchild[1].sibling'

or as lists of keys

['parent', 'child', 'grandchild', 1, 'sibling']

So it would look something like

class Meta(JSONSerializable.Meta):
        json_key_to_field = {
            'parent.child.grandchild[1].sibling': 'my_str'
        }

or

json_field(
        ('myJSONKey',
         'parent.child.grandchild[1].sibling',
         ('parent2', 'child', 'grandchild', 4, 'sibling'),
         'myField'
)

As of yet, I have not been able to come up with a way to accomplish this with dataclass-wizard. Do you know if this is currently possible? If not, is it something you would consider implementing?

Thank you for a terrific module btw!

iwconfig avatar May 21 '22 21:05 iwconfig

Hi @iwconfig, thanks for opening this issue! I agree that a feature request for a nested JSON path traversal is certainly an interesting one, and one that is worth supporting. I've actually added a milestone a while back to help track this feature, but it is definitely one that I have plans to implement in one of the upcoming releases.

I will keep this thread updated as I make more progress towards the request. In any case, I've also added a 'help wanted' label to this issue in case anyone wants to take a stab at implementing the feature as well.

rnag avatar May 23 '22 14:05 rnag

Here is some initial work I've been able to put together so far. This was inspired in part by this post which was posted on SO. It could use some slight modifications, but I am glad that it's at least working so far.

from functools import reduce


class JsonPath:

    @classmethod
    def get(cls, data, path):
        for p in path:
            data = data[p]

        return data

    @classmethod
    def get_v2(cls, data, path):
        """For some reason, an approach with `functools.reduce` is slower than a `for` loop"""
        return reduce(cls._get_item, path, data)

    @classmethod
    def _get_item(cls, current_data, current_path):
        return current_data[current_path]

    @classmethod
    def get_safe(cls, data, path, default=None):
        """Same as `get` but handles cases where key is missing, or index is out of bounds."""
        current_data = data
        p = path  # to avoid "unbound local variable" warnings

        try:
            for p in path:
                current_data = current_data[p]

            return current_data

        # IndexError -
        #   raised when `data` is a `list`, and we access an index that is "out of bounds"
        # KeyError -
        #   raised when `data` is a `dict`, and we access a key that is not present
        # AttributeError -
        #   raised when `data` is an invalid type, such as a `None`
        except (IndexError, KeyError, AttributeError):
            return default

        # TypeError -
        #   raised when `data` is a `list`, but we try to use it like a `dict`
        except TypeError as e:
            raise TypeError(f'Invalid path\n  '
                            f'data={data}\n  '
                            f'path={path}\n  '
                            f'current_data={current_data}\n  '
                            f'current_path={p!r}\n  '
                            f'error={e}') from None


if __name__ == '__main__':
    from timeit import timeit

    d = {"a": {"b": [1, {"c": ["d"]}, 2, ["hello world"]]}}

    assert JsonPath.get_safe(d, ['z']) is None
    assert JsonPath.get(d, ['a', 'b', 3, -1]) == 'hello world'

    data_path = ['a', 'b', 1, 'c', 0]
    data_path_invalid = ['a', 'b', 1, 'c', 321]
    get_fn = lambda x: x['a']['b'][1]['c'][0]

    assert JsonPath.get(d, data_path) == 'd'
    assert JsonPath.get_safe(d, data_path_invalid, 112233) == 112233
    assert get_fn(d) == 'd'

    n = 100_000

    print(f'get (no loop):       {timeit("get_fn(d)", globals=globals(), number=n):.3f}')
    print(f'get:                 {timeit("JsonPath.get(d, data_path)", globals=globals(), number=n):.3f}')
    print(f'get (reduce):        {timeit("JsonPath.get_v2(d, data_path)", globals=globals(), number=n):.3f}')
    print(f'get_safe:            {timeit("JsonPath.get_safe(d, data_path, 112233)", globals=globals(), number=n):.3f}')
    print(f'get_safe (invalid):  {timeit("JsonPath.get_safe(d, data_path_invalid, 112233)", globals=globals(), number=n):.3f}')

rnag avatar May 25 '22 15:05 rnag

Oh, I didn't see that milestone, sorry. That's great!

Wow, awesome! What you've got there is just for paths separated into a list, right? I know fnc and pydash (same author) also offers this functionality, and can handle dot-delimited paths as well. Maybe you'll find something interesting in how they do it. pydash is more mature but fnc is faster due to generator based approach.

Thank you for this! :1st_place_medal:

iwconfig avatar May 25 '22 19:05 iwconfig