Question/feature request: Map a (nested) JSON/dict Key to a Field
It is possible to map JSON keys to fields with different names. However, I would like to map nested JSON/dict by key path, preferably by dot-delimited strings
'parent.child.grandchild[1].sibling'
or as lists of keys
['parent', 'child', 'grandchild', 1, 'sibling']
So it would look something like
class Meta(JSONSerializable.Meta):
json_key_to_field = {
'parent.child.grandchild[1].sibling': 'my_str'
}
or
json_field(
('myJSONKey',
'parent.child.grandchild[1].sibling',
('parent2', 'child', 'grandchild', 4, 'sibling'),
'myField'
)
As of yet, I have not been able to come up with a way to accomplish this with dataclass-wizard. Do you know if this is currently possible? If not, is it something you would consider implementing?
Thank you for a terrific module btw!
Hi @iwconfig, thanks for opening this issue! I agree that a feature request for a nested JSON path traversal is certainly an interesting one, and one that is worth supporting. I've actually added a milestone a while back to help track this feature, but it is definitely one that I have plans to implement in one of the upcoming releases.
I will keep this thread updated as I make more progress towards the request. In any case, I've also added a 'help wanted' label to this issue in case anyone wants to take a stab at implementing the feature as well.
Here is some initial work I've been able to put together so far. This was inspired in part by this post which was posted on SO. It could use some slight modifications, but I am glad that it's at least working so far.
from functools import reduce
class JsonPath:
@classmethod
def get(cls, data, path):
for p in path:
data = data[p]
return data
@classmethod
def get_v2(cls, data, path):
"""For some reason, an approach with `functools.reduce` is slower than a `for` loop"""
return reduce(cls._get_item, path, data)
@classmethod
def _get_item(cls, current_data, current_path):
return current_data[current_path]
@classmethod
def get_safe(cls, data, path, default=None):
"""Same as `get` but handles cases where key is missing, or index is out of bounds."""
current_data = data
p = path # to avoid "unbound local variable" warnings
try:
for p in path:
current_data = current_data[p]
return current_data
# IndexError -
# raised when `data` is a `list`, and we access an index that is "out of bounds"
# KeyError -
# raised when `data` is a `dict`, and we access a key that is not present
# AttributeError -
# raised when `data` is an invalid type, such as a `None`
except (IndexError, KeyError, AttributeError):
return default
# TypeError -
# raised when `data` is a `list`, but we try to use it like a `dict`
except TypeError as e:
raise TypeError(f'Invalid path\n '
f'data={data}\n '
f'path={path}\n '
f'current_data={current_data}\n '
f'current_path={p!r}\n '
f'error={e}') from None
if __name__ == '__main__':
from timeit import timeit
d = {"a": {"b": [1, {"c": ["d"]}, 2, ["hello world"]]}}
assert JsonPath.get_safe(d, ['z']) is None
assert JsonPath.get(d, ['a', 'b', 3, -1]) == 'hello world'
data_path = ['a', 'b', 1, 'c', 0]
data_path_invalid = ['a', 'b', 1, 'c', 321]
get_fn = lambda x: x['a']['b'][1]['c'][0]
assert JsonPath.get(d, data_path) == 'd'
assert JsonPath.get_safe(d, data_path_invalid, 112233) == 112233
assert get_fn(d) == 'd'
n = 100_000
print(f'get (no loop): {timeit("get_fn(d)", globals=globals(), number=n):.3f}')
print(f'get: {timeit("JsonPath.get(d, data_path)", globals=globals(), number=n):.3f}')
print(f'get (reduce): {timeit("JsonPath.get_v2(d, data_path)", globals=globals(), number=n):.3f}')
print(f'get_safe: {timeit("JsonPath.get_safe(d, data_path, 112233)", globals=globals(), number=n):.3f}')
print(f'get_safe (invalid): {timeit("JsonPath.get_safe(d, data_path_invalid, 112233)", globals=globals(), number=n):.3f}')
Oh, I didn't see that milestone, sorry. That's great!
Wow, awesome! What you've got there is just for paths separated into a list, right? I know fnc and pydash (same author) also offers this functionality, and can handle dot-delimited paths as well. Maybe you'll find something interesting in how they do it. pydash is more mature but fnc is faster due to generator based approach.
Thank you for this! :1st_place_medal: