bigquery-utils
bigquery-utils copied to clipboard
Request: Handling for objects in json_extract_values function
Issue:
bqutils.fn.json_extract_values
returns unusable values [object Object]
when children are objects / sub collections.
Example:
Sample query:
-- (select body from`bigquery-public-data.geo_whos_on_first.geojson` limit 100 )
with geojson as
(select """{"id": 1259392143, "type": "Feature", "properties": {"edtf:cessation": "uuuu", "edtf:inception": "uuuu", "geom:area": 0.0, "geom:area_square_m": 0.0, "geom:bbox": "116.97547,32.41412,116.97547,32.41412", "geom:latitude": 32.41412, "geom:longitude": 116.97547, "gn:admin1_code": "1.0", "gn:asciiname": "Liuyingcun", "gn:country_code": "CN", "gn:dem": 25, "gn:feature_class": "P", "gn:feature_code": "PPL", "gn:geonameid": 10417093, "gn:latitude": 32.41412, "gn:longitude": 116.97547, "gn:modification_date": "2015-08-08", "gn:name": "Liuyingcun", "gn:population": 0, "gn:timezone": "Asia/Shanghai", "iso:country": "CN", "mz:hierarchy_label": 1, "mz:is_current": -1, "name:zho_x_preferred": ["\u5218\u90e2\u6751"], "src:geom": "geonames", "wof:belongsto": [85669739, 102191569, 85632695, 136253041, 890516233], "wof:breaches": [], "wof:concordances": {"gn:id": 10417093}, "wof:country": "CN", "wof:geomhash": "cabdc55714fc98313d2246b1c4f7fb0b", "wof:hierarchy": [{"continent_id": 102191569, "country_id": 85632695, "county_id": 890516233, "empire_id": 136253041, "locality_id": 1259392143, "region_id": 85669739}], "wof:id": 1259392143, "wof:lastmodified": 1537613321, "wof:name": "Liuyingcun", "wof:parent_id": 890516233, "wof:placetype": "locality", "wof:repo": "whosonfirst-data", "wof:superseded_by": [], "wof:supersedes": [], "wof:tags": []}, "bbox": [116.97547, 32.41412, 116.97547, 32.41412], "geometry": {"coordinates": [116.97547, 32.41412], "type": "Point"}}""" as body)
SELECT bqutil.fn.json_extract_keys(body) as jkeys, bqutil.fn.json_extract_values(body) as jvalues
FROM geojson
I used a single row here because the public dataset is 14GB and unpartitioned.
Desired Output:
bqutils.fn.json_extract_values
returns a usable value (even just a json string) of the contents of those objects.
Thanks!
Alternative might be to build something like this as a whole separate UDF:
-- extract all key value pairs as an array from a json dict
-- input: json string with a dictionary
-- returns: list of struct <key, value>
CREATE TEMP FUNCTION EXTRACT_KV_PAIRS(json_str STRING)
RETURNS ARRAY<STRUCT<key STRING, value STRING>>
LANGUAGE js AS """
try{
const json_dict = JSON.parse(json_str);
const all_kv = Object.entries(json_dict).map(
(r)=>Object.fromEntries([["key", r[0]],["value",
JSON.stringify(r[1])]]));
return all_kv;
} catch(e) { return [{"key": "error","value": e}];}
""";
Source: https://medium.com/google-cloud/extracting-json-key-value-pairs-in-bigquery-1bb9d0ec0b6d
@plaflamme Just contributed the json_extract_key_value_pairs function in #408 which should help with this use case, please let us know if there are any issues