arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[Python] Allow parsing more general JSON formats

Open asfimport opened this issue 6 years ago • 2 comments

I have JSON data where the columnar (line-delimited) part is in a data subkey:


{
  "metadata": {"name": "block1"},
  "data" : [
    {"a": 1, "b": 2.0, "c": "foo", "d": false},
    {"a": 4, "b": -5.5, "c": null, "d": true}
  ]
}

 

 

It would be good if the arrow JSON parser could allow specifying where the columnar data is stored.

Since the metadata is also important to me it would be even better if the rest of the JSON could be returned as a Python dict with the only the specified keys parsed as arrow tables - e.g.

 


>>> block1 = json.read_json(fn, tables=['data'])
>>> block1['data']
pyarrow.Table
a: int64
b: double
c: string
d: bool
>>> block1['metadata']
{'name': 'block1'}
>>> block1
{
  "metadata": {"name": "block1"},
  "data" : pyarrow.Table
}

 

 

Reporter: Dave Hirschfeld / @dhirschfeld

Note: This issue was originally created as ARROW-5568. Please see the migration documentation for further details.

asfimport avatar Jun 12 '19 05:06 asfimport

Joris Van den Bossche / @jorisvandenbossche:

I have JSON data where the columnar (line-delimited) part is in a data subkey:

Note that the data subpart is not line delimited, but a comma-delimited JSON array. So that's a first thing that would be good to support.

Some additional resources that might be useful: in pandas there are many formats supported, called "orients", see the overview table at http://pandas.pydata.org/pandas-docs/version/0.24/user_guide/io.html#reading-json (disclaimer: I don't know how common the different formats are, so it doesn't necessarily makes sense to copy them all from pandas).

One of the formats is the JSON Table Schema (https://frictionlessdata.io/specs/table-schema/), which is a json file with a 'metadata' and 'data' top-level keys, where the 'data' then consists of comma-delimited records (so very similar in structure as what @dhirschfeld showed above).

asfimport avatar Jun 12 '19 06:06 asfimport

This issue hasn't had activity in a long time. If it's still being worked on, please leave a comment. Otherwise, it will be closed on 23rd June.

Labelled Status: Stale-Warning for tracking.

thisisnic avatar Jun 21 '25 08:06 thisisnic