boltons icon indicating copy to clipboard operation
boltons copied to clipboard

Functions to work with json structures

Open jankatins opened this issue 10 years ago • 4 comments

I've recently build a few functions to work with nested structures coming from a JSON web API.

data = {"ID1":{"result":{"name":"Jan Schulz"}},
        "ID2":{"result": {"name":"Another name", "bday":"1.1.2000"}}}
print(find_in_structure(data, "Schulz"), get_from_structure(data, find_in_dict(data, "Schulz")))
## ID1.result.name Jan Schulz
converter_dict = dict(
    names = "result.name",
    bday = "result.bday"
)
import pandas as pd
print(pd.DataFrame(convert_to_dataframe_input(data, converter_dict)))
##   _index      bday         names
## 0    ID1       NaN    Jan Schulz
## 1    ID2  1.1.2000  Another name

I just found this lib and these three functions seem to fit in with boltons.iterutils.remap and boltons.jsonutils. Would you be willing to take them in?

Code is currently here: http://janschulz.github.io/working_with_dicts.html

jankatins avatar Sep 26 '15 12:09 jankatins

I find myself doing similar operations. @JanSchulz have you tried to express some of your functions in terms of functions available in toolz (toolz.dicttoolz) or funcy?

There's a lot of overlapping functionality between boltons, toolz, and funcy; I'm not sure which package to use for these kinds of operations. Perhaps others have advice?

ariddell avatar Sep 26 '15 14:09 ariddell

Oh, yes, these very much fit in. In fact, I've got TODOs in iterutils and I'd already started on a couple experimental implementations in my ipython notebook, but still haven't found the right one yet. That said, my current get_path is almost identical to your get_from_structure except that it, in addition to the string, it also allows a tuple/list version of the path for more compatibility with remap (and general easier programmatic path construction). There's also handling for the corner case of numeric string keys in dictionaries. Otherwise, in other words, I very much approve! :)

@ariddell So I discovered toolz a long time ago, before boltons for sure, and I was quite excited. But while on the whole it has more operations in this domain, I found myself barely using any of them. toolz tries hard to provide a certain functional paradigm, and I didn't find the operations that much clearer than the standard ones.

In this specific case, I think remap is unique among both libraries, as it's for customizably working with arbitrarily nested data. Most toolz I've seen are (for others) handy conventions for dealing with one or two levels of nesting.

mahmoud avatar Sep 26 '15 18:09 mahmoud

Totally agree, there is a whole family of functions available here, all slightly different. It is hard to tell what the "complete" set of operations would be.

find_in_structure and get_from_structure is an example of operating with paths, which there are many ways to do. ("Path" meaning a sequence of attribute / item accesses which can be stored and applied to any object.) One implementation I've made works with future objects -- a path would be stored expressing what data to access on the future response.

I'm hoping there is some clean abstraction that covers many cases. Something like SQL for trees. There are systems like JSONiq and XQuery (http://www.jsoniq.org/) which make a good reference point. Their downsides are they require an external library and I think operating on "result sets" is the wrong abstraction.

There are some other examples out there too this is a fun research project for me for now :-) Maybe something will come of it.

kurtbrose avatar Sep 26 '15 18:09 kurtbrose

@mahmoud the tuple-as-path variant was what draw me to this bugtracker :-) I've no problem to rebuilt my variant with that in mind, but I think that you already have them :-)

The convert_to_dataframe_input can probably redone in with a function which wraps remap, but IMO this results in more overhead (many more comparisons, as all possible paths in the structure have to be checked and not only some specific ones "tried". Reading you todo, I like the idea of "collecting" something in the structure, so collect(data, pathes) sounds fine, too.

I can submit this as a PR (as addition in iterutils) if that's easier to discuss.

jankatins avatar Sep 26 '15 18:09 jankatins