Hive-JSON-Serde
Hive-JSON-Serde copied to clipboard
get_json_object on Json String does not work with Serde
My sample table definition is as follows create external table if not exists test ( ctx string, test int )PARTITIONED BY (game_id STRING) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
Input data: {"ctx":{"reason":184,"initial":105,"name":62,"delta":5}, "test":123,"game_id":"xxx"}
I want to avoid creating struct for every json string , so I am using string for ctx instead of struct , When I query ctx using get_json_object ,
select get_json_object(ctx,'$.reason') from test where game_id='xxx';
I get following exception
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.UDFJson.evaluate(java.lang.String,java.lang.String) on object org.apache.hadoop.hive.ql.udf.UDFJson@662d167e of class org.apache.hadoop.hive.ql.udf.UDFJson with arguments {{"reason":"184","initial":"105","delta":"5","name":"62"}:org.openx.data.jsonserde.json.JSONObject, $.reason:java.lang.String} of size 2 at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1241) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 9 more Caused by: java.lang.IllegalArgumentException: argument type mismatch at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1217) ... 18 more
It converts string to org.openx.data.jsonserde.json.JSONObject .
Any pointers why it does not work or why/where the conversion happen?
That's not how the SerDe works. You can declare your table as a normal table without the json serde if you want to use get_json_object.
If you want to use the json serde, in this case, you should declare your table as create external table if not exists test (ctx structreason:int,initial:int,name:int,delta:int, test int )PARTITIONED BY (game_id STRING) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
and then you can do select ctx.reason from test ...
"Good judgment comes from experience.
Experience comes from bad judgment"
Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu
On Monday, May 19, 2014 1:29 PM, Snehal Nagmote [email protected] wrote:
My table looks something like this create external table if not exists test ( ctx string, test int )PARTITIONED BY (game_id STRING) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; Input data: {"ctx":{"reason":184,"initial":105,"name":62,"delta":5}, "test":123,"game_id":"xxx"} I want to avoid creating struct for every json string , so I define json string in here .When I query ctx using get_json_object , select get_json_object(ctx,'$.reason') from test where game_id='xxx'; I get following exception Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.UDFJson.evaluate(java.lang.String,java.lang.String) on object org.apache.hadoop.hive.ql.udf.UDFJson@662d167 of class org.apache.hadoop.hive.ql.udf.UDFJson with arguments {{"reason":"184","initial":"105","delta":"5","name":"62"}:org.openx.data.jsonserde.json.JSONObject, $.reason:java.lang.String} of size 2 at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1241) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 9 more Caused by: java.lang.IllegalArgumentException: argument type mismatch at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1217) ... 18 more It converts string to org.openx.data.jsonserde.json.JSONObject . Any pointers why it does not work ? — Reply to this email directly or view it on GitHub.
It seems like the primary use case for a SerDe like this one would be for manipulating semi-structured data. Sometimes this means that we will know all the options a-priori. But what happens when someone wants to start sending random things in these rows? That is to say, most of the time you see a row that looks like so:
{"ctx":{"reason":184,"initial":105,"name":62,"delta":5}, "test":123,"game_id":"xxx"}
However, someone decided they were going to send a few rows that look like this:
{"ctx":{"reason":184,"initial":105,"name":62,"delta":5,"something:"special"}, "test":123,"game_id":"xxx"}
Right now this is doable by exposing two tables:
create external table raw (
ctx string,
game_id string,
test int
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
location 'Some/Specific/Location';
And
create external table convenience (
ctx struct<
name:string,
reason:string,
delta:int,
initial:int
>,
game_id string,
test int
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
location 'Some/Specific/Location';
So the overall theory is that someone can get a random value that they put somewhere by issuing a query like this:
select get_json_object(ctx, '$.name') as name, get_json_object(ctx, '$.special) as special
from raw
where get_json_object(ctx, '$.special) is not null;
Is there truly no way to use get_json_object on a field declared to be a string inside the SerDe?
In the current architecture, the json object is parsed, and in this case, 'ctx' is mapped to a JSON Object, which is normally read by a JSON Struct Object Inspector. To be read as a string, I'd need to write a String Object Inspector that can handle JSONObjects.
As a quick alternative (but I am not sure it works) you could force the JSON object to be serialized as a string with a cast. select get_json_object(CAST(ctx as string), '$.name') as name, get_json_object(CAST(ctx as string), '$.special) as special
"Good judgment comes from experience.
Experience comes from bad judgment"
Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu
On Monday, May 19, 2014 1:57 PM, Mark Roberts [email protected] wrote:
It seems like the primary use case for a SerDe like this one would be for manipulating semi-structured data. Sometimes this means that we will know all the options a-priori. But what happens when someone wants to start sending random things in these rows? That is to say, most of the time you see a row that looks like so: {"ctx":{"reason":184,"initial":105,"name":62,"delta":5}, "test":123,"game_id":"xxx"} However, someone decided they were going to send a few rows that look like this: {"ctx":{"reason":184,"initial":105,"name":62,"delta":5,"something:"special"}, "test":123,"game_id":"xxx"} Right now this is doable by exposing two tables: create external table raw ( ctx string, game_id string, test int ) row format serde 'org.openx.data.jsonserde.JsonSerDe' location 'Some/Specific/Location'; And create external table convenience ( ctx struct< name:string, reason:string, delta:int, initial:int >, game_id string, test int ) row format serde 'org.openx.data.jsonserde.JsonSerDe' location 'Some/Specific/Location'; So the overall theory is that someone can get a random value that they put somewhere by issuing a query like this: select get_json_object(ctx, '$.name') as name, get_json_object(ctx, '$.special) as special from raw where get_json_object(ctx, '$.special) is not null; Is there truly no way to use get_json_object on a field declared to be a string inside the SerDe? — Reply to this email directly or view it on GitHub.
I didn't have any luck with the cast workaround for get_json_object. But "lateral view json_tuple" seems to be working.