Hive-JSON-Serde icon indicating copy to clipboard operation
Hive-JSON-Serde copied to clipboard

Accessing a sibling's value from a JSON Array in JSONserde

Open saidatta opened this issue 7 years ago • 2 comments

Hey,

Given This is my JSON

...
    "actions": [
      {
        "action_type": "link_click",
        "value": "1"
      },
      {
        "action_type": "page_engagement",
        "value": "1"
      },
      {
        "action_type": "post_engagement",
        "value": "1"
      },
      .....
    ]
  }
...

I would like to map each action_type's value into a specific column of my external Table

CREATE EXTERNAL TABLE IF NOT EXISTS  raw_marketing_other.facebook_ad_cost_dtl_hrly (
   ...
   linkClick : int
   pageEngagement : int
...
)

For example, linkClick column will have the value of 1 derived from that json object.

Question Is this possible in the mapping of serdeProperties? something like this.

WITH serdeproperties(
    "mapping.linkClick" = "actions[action_type="link_click].value",
...

Note: where actions[action_type="link_click]" is meant to return the first occurence of json object with that condition?

If it is not possible through custom mappings, then is there any alternative solution for it?

Thanks

saidatta avatar Aug 23 '17 18:08 saidatta

I see... I was about to tell you to try with views, but it's not that easy since you want to pick something in the array.

There are 2 ways you can do something like that, one is to conver 'action'to string and use https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object

or to write a generic UDF that takes an array of structs, and outputs the metric you want.

The SerDe does not do complex mapping, that is better done upstream. The mapping is meant for those occasions when the naming in json is not compatible with hive (different rules for case sensitivity, reserved words, etc).

R.

2017-08-23 11:10 GMT-07:00 Venkata Munnangi [email protected]:

Hey,

Given This is my JSON

... "actions": [ { "action_type": "link_click", "value": "1" }, { "action_type": "page_engagement", "value": "1" }, { "action_type": "post_engagement", "value": "1" }, ..... ] } ...

I would like to map each action_type's value into a specific column of my external Table

CREATE EXTERNAL TABLE IF NOT EXISTS raw_marketing_other.facebook_ad_cost_dtl_hrly ( ... linkClick : int pageEngagement : int ... )

Question Is this possible in the mapping of serdeProperties? something like this.

WITH serdeproperties( "mapping.linkClick" = "actions[action_type="link_click].value", ...

Note: where actions[action_type="link_click]" is meant to return the first occurence of json object with that condition?

If it is not possible through custom mappings, then is there any alternative solution for it?

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rcongiu/Hive-JSON-Serde/issues/195, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3mtfdEUolKBPTeDs2ltGTg9Sb0Wdwvks5sbGsSgaJpZM4PAYpA .

rcongiu avatar Aug 23 '17 18:08 rcongiu

Bummer.

I can do mapping with nested map attributes, right? For example

    "actions": {
        "link_click": "1",
        "page_like": "2"
      },
      .....

and do

WITH serdeproperties(
    "mapping.linkClick" = "actions.link_click",
    "mapping.page_like" = "actions.page_like",
...

for mapping linkClick and page_like columns to the nested attributes link_click and page_like respectively.

Do I need to provide any additional flags?

saidatta avatar Aug 23 '17 19:08 saidatta