Hive-JSON-Serde
Hive-JSON-Serde copied to clipboard
Serde Issues with array at the top level
Hi,
Given a dynamic-sized array of the form [{"aa" : "bb"}, {"aa" : "cc"}, {"aa":"dd"}] There seem to be no solution with the serde other than defining 3 struct. an implementation of cretae external table x (a array<structaa:string>) will no work. The question is why, and if this can be fixed. Since other than that it seems like the perfect solution for json handling.
create table x ( a array <struct aa:string> ) will work fine with your example
'Create table x (a array <struct aa:string>)'
Sorry on cell phone.. can't find bacticks on this keyboard
I probably do not understand how.Try using this file " [{xxx:{"name":"sergay"}},{xxx:{"name":"zzz"}}] [{xxx:{"name":"sergay1"}},{xxx:{"name":"zzzb"}}] "
then use the following external tables (assuming the file is in path /user/cloudera/json drop table json_test1; create external table json_test1 ( a array<struct<xxx:structname:string>> ) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as TEXTFILE LOCATION '/user/cloudera/json'; drop table json_test2; create external table json_test2 ( a structxxx:structname:string, b structxxx:structname:string ) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as TEXTFILE LOCATION '/user/cloudera/json';
see what you get as you select from both table (while the 2nd works fine, the 1st will result in error) Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.openx.data.jsonserde.json.JSONObject cannot be cast to org.openx.data.jsonserde.json.JSONArray
So it does not seem to work, unfortunately
Oh I see. So, I coded a while ago a feature to support an array as top level object instead of a struct. You can change the data to a struct { col: [...] } to make it work
I am facing same error. My JSON look like this.
[{
"test": "a",
"world": "b",
"time": [{
"start": "b",
"end": "99"
}]
}]
This one should work, declared as array<struct<test:string,world:string,time:<array<structstart:string,end:string>>>>
"Good judgment comes from experience. Experience comes from bad judgment"
On Thursday, January 5, 2017 8:13 AM, Anders Boje Larsen <[email protected]> wrote:
I am facing same error. My JSON look like this.[{ "test": "a", "world": "b", "time": [{ "start": "b", "end": "99" }] }] — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
I have the same issue different error
I reproduced the issue even for the json @boje provided with @rcongiu declaration;
[{ "test": "a", "world": "b", "time": [{ "start": "b", "end": "99" }] }]
table I try to create is;
create external table message( msg array<struct<test:string,world:string>> ) ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe'
I get this error;
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Missing value at 1 [character 2 line 1]
Any help would be truly appreciated
It should support top level arrays. Have you tried declaring the table as create external table message( msg structtest:string,world:string ) ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe'
? ------------------------------------------------------- "Good judgment comes from experience. Experience comes from bad judgment"
On Tuesday, January 17, 2017 2:03 PM, Bilal Nemutlu <[email protected]> wrote:
I have the same issue different errorI reproduced the issue even for the json @boje provided with @rcongiu declaration; [{ "test": "a", "world": "b", "time": [{ "start": "b", "end": "99" }] }]table I try to create is; create external table message( msg array<structtest:string,world:string> ) ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe'I get this error;Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Missing value at 1 [character 2 line 1]Any help would be truly appreciated— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
I got the problem that you helped resolved here
However I see another issue, when i query the table, it returns the first object in that array
the file has the following one line json array
[{"event":{"id":"1","type":"c"}},{"event":{"id":"2","type":"d"}}]
table declaration msg array<struct<event:struct<id:string, type:string>>>
it returns only this;
[{"event":{"id":"1","type":"c"}}]
i checked the count(*), it shows 1 row only
Thank you bunch for the help
As far as i see, it considers the input json as just one row and it returns the first object within that!
I also tried struct msg struct<event:struct<id:string, type:string>>
but no luck.
changed the json structure as [{"id":"1","type":"c"},{"id":"2","type":"d"}]
and table declaration msg struct<id:string, type:string>
no luck again and just returns the first one {"id":"1","type":"c"}
I think i am very close to finish it up with your serde and I'd really appreciate your support!
Hi eastTowest ,
Were you able to solve this, i am facing the same issue.
Did anyone solve this issue. I am facing similar issue with top level array. can anyone suggest me a solution.
[{"user_id": "a"," previous_user_id": "b"},{"user_id": "c"," previous_user_id": "d"},{"user_id": "e"," previous_user_id": "f"}] create external table array_tmp (User array<struct<user_id: String, previous_user_id:String>>) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
on select user.user_id from array_tmp it gives exception as Row is not a valid JSON Object