Hive-JSON-Serde icon indicating copy to clipboard operation
Hive-JSON-Serde copied to clipboard

get_json_object on Json String does not work with Serde

Open snehalNagmote opened this issue 10 years ago • 4 comments

My sample table definition is as follows create external table if not exists test ( ctx string, test int )PARTITIONED BY (game_id STRING) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

Input data: {"ctx":{"reason":184,"initial":105,"name":62,"delta":5}, "test":123,"game_id":"xxx"}

I want to avoid creating struct for every json string , so I am using string for ctx instead of struct , When I query ctx using get_json_object ,

select get_json_object(ctx,'$.reason') from test where game_id='xxx';

I get following exception

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.UDFJson.evaluate(java.lang.String,java.lang.String) on object org.apache.hadoop.hive.ql.udf.UDFJson@662d167e of class org.apache.hadoop.hive.ql.udf.UDFJson with arguments {{"reason":"184","initial":"105","delta":"5","name":"62"}:org.openx.data.jsonserde.json.JSONObject, $.reason:java.lang.String} of size 2 at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1241) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 9 more Caused by: java.lang.IllegalArgumentException: argument type mismatch at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1217) ... 18 more

It converts string to org.openx.data.jsonserde.json.JSONObject .

Any pointers why it does not work or why/where the conversion happen?

snehalNagmote avatar May 19 '14 20:05 snehalNagmote

That's not how the SerDe works. You can declare your table as a normal table without the json serde if you want to use get_json_object.

If you want to use the json serde, in this case, you should declare your table as   create external table if not exists test (ctx structreason:int,initial:int,name:int,delta:int, test int )PARTITIONED BY (game_id STRING) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

and then you can do select ctx.reason from test ...


"Good judgment comes from experience.

Experience comes from bad judgment"

Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu

On Monday, May 19, 2014 1:29 PM, Snehal Nagmote [email protected] wrote:

My table looks something like this create external table if not exists test ( ctx string, test int )PARTITIONED BY (game_id STRING) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; Input data: {"ctx":{"reason":184,"initial":105,"name":62,"delta":5}, "test":123,"game_id":"xxx"} I want to avoid creating struct for every json string , so I define json string in here .When I query ctx using get_json_object , select get_json_object(ctx,'$.reason') from test where game_id='xxx'; I get following exception Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.UDFJson.evaluate(java.lang.String,java.lang.String) on object org.apache.hadoop.hive.ql.udf.UDFJson@662d167 of class org.apache.hadoop.hive.ql.udf.UDFJson with arguments {{"reason":"184","initial":"105","delta":"5","name":"62"}:org.openx.data.jsonserde.json.JSONObject, $.reason:java.lang.String} of size 2 at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1241) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 9 more Caused by: java.lang.IllegalArgumentException: argument type mismatch at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1217) ... 18 more It converts string to org.openx.data.jsonserde.json.JSONObject . Any pointers why it does not work ? — Reply to this email directly or view it on GitHub.

rcongiu avatar May 19 '14 20:05 rcongiu

It seems like the primary use case for a SerDe like this one would be for manipulating semi-structured data. Sometimes this means that we will know all the options a-priori. But what happens when someone wants to start sending random things in these rows? That is to say, most of the time you see a row that looks like so:

{"ctx":{"reason":184,"initial":105,"name":62,"delta":5}, "test":123,"game_id":"xxx"}

However, someone decided they were going to send a few rows that look like this:

{"ctx":{"reason":184,"initial":105,"name":62,"delta":5,"something:"special"}, "test":123,"game_id":"xxx"}

Right now this is doable by exposing two tables:

create external table raw (
    ctx string,
    game_id string,
    test int
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
location 'Some/Specific/Location';

And

create external table convenience (
    ctx struct<
        name:string,
        reason:string,
        delta:int,
        initial:int
    >,
    game_id string,
    test int
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
location 'Some/Specific/Location';

So the overall theory is that someone can get a random value that they put somewhere by issuing a query like this:

select get_json_object(ctx, '$.name') as name, get_json_object(ctx, '$.special) as special
from raw
where get_json_object(ctx, '$.special) is not null;

Is there truly no way to use get_json_object on a field declared to be a string inside the SerDe?

wizzat avatar May 19 '14 20:05 wizzat

In the current architecture, the json object is parsed, and in this case, 'ctx' is mapped to a JSON Object, which is normally read by a JSON Struct Object Inspector. To be read as a string, I'd need to write a String Object Inspector that can handle JSONObjects.

As a quick alternative (but I am not sure it works) you could force the JSON object to be serialized as a string with a cast. select get_json_object(CAST(ctx as string), '$.name') as name, get_json_object(CAST(ctx as string), '$.special) as special

 

"Good judgment comes from experience.

Experience comes from bad judgment"

Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu

On Monday, May 19, 2014 1:57 PM, Mark Roberts [email protected] wrote:

It seems like the primary use case for a SerDe like this one would be for manipulating semi-structured data. Sometimes this means that we will know all the options a-priori. But what happens when someone wants to start sending random things in these rows? That is to say, most of the time you see a row that looks like so: {"ctx":{"reason":184,"initial":105,"name":62,"delta":5}, "test":123,"game_id":"xxx"} However, someone decided they were going to send a few rows that look like this: {"ctx":{"reason":184,"initial":105,"name":62,"delta":5,"something:"special"}, "test":123,"game_id":"xxx"} Right now this is doable by exposing two tables: create external table raw ( ctx string, game_id string, test int ) row format serde 'org.openx.data.jsonserde.JsonSerDe' location 'Some/Specific/Location'; And create external table convenience ( ctx struct< name:string, reason:string, delta:int, initial:int >, game_id string, test int ) row format serde 'org.openx.data.jsonserde.JsonSerDe' location 'Some/Specific/Location'; So the overall theory is that someone can get a random value that they put somewhere by issuing a query like this: select get_json_object(ctx, '$.name') as name, get_json_object(ctx, '$.special) as special from raw where get_json_object(ctx, '$.special) is not null; Is there truly no way to use get_json_object on a field declared to be a string inside the SerDe? — Reply to this email directly or view it on GitHub.

rcongiu avatar May 19 '14 21:05 rcongiu

I didn't have any luck with the cast workaround for get_json_object. But "lateral view json_tuple" seems to be working.

furmangg avatar Aug 14 '15 03:08 furmangg