Hive-JSON-Serde icon indicating copy to clipboard operation
Hive-JSON-Serde copied to clipboard

Serde Issues with array at the top level

Open RaniRaven opened this issue 8 years ago • 14 comments

Hi,

Given a dynamic-sized array of the form [{"aa" : "bb"}, {"aa" : "cc"}, {"aa":"dd"}] There seem to be no solution with the serde other than defining 3 struct. an implementation of cretae external table x (a array<structaa:string>) will no work. The question is why, and if this can be fixed. Since other than that it seems like the perfect solution for json handling.

RaniRaven avatar May 25 '16 09:05 RaniRaven

create table x ( a array <struct aa:string> ) will work fine with your example

rcongiu avatar May 25 '16 15:05 rcongiu

'Create table x (a array <struct aa:string>)'

rcongiu avatar May 25 '16 15:05 rcongiu

Sorry on cell phone.. can't find bacticks on this keyboard

rcongiu avatar May 25 '16 15:05 rcongiu

I probably do not understand how.Try using this file " [{xxx:{"name":"sergay"}},{xxx:{"name":"zzz"}}] [{xxx:{"name":"sergay1"}},{xxx:{"name":"zzzb"}}] "

then use the following external tables (assuming the file is in path /user/cloudera/json drop table json_test1; create external table json_test1 ( a array<struct<xxx:structname:string>> ) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as TEXTFILE LOCATION '/user/cloudera/json'; drop table json_test2; create external table json_test2 ( a structxxx:structname:string, b structxxx:structname:string ) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as TEXTFILE LOCATION '/user/cloudera/json';

see what you get as you select from both table (while the 2nd works fine, the 1st will result in error) Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.openx.data.jsonserde.json.JSONObject cannot be cast to org.openx.data.jsonserde.json.JSONArray

RaniRaven avatar May 26 '16 16:05 RaniRaven

So it does not seem to work, unfortunately

RaniRaven avatar May 26 '16 16:05 RaniRaven

Oh I see. So, I coded a while ago a feature to support an array as top level object instead of a struct. You can change the data to a struct { col: [...] } to make it work

rcongiu avatar May 26 '16 17:05 rcongiu

I am facing same error. My JSON look like this.

[{
	"test": "a",
	"world": "b",
	"time": [{
		"start": "b",
		"end": "99"
	}]
}]

boje avatar Jan 05 '17 16:01 boje

This one should work, declared as array<struct<test:string,world:string,time:<array<structstart:string,end:string>>>>


"Good judgment comes from experience. Experience comes from bad judgment"

On Thursday, January 5, 2017 8:13 AM, Anders Boje Larsen <[email protected]> wrote:

I am facing same error. My JSON look like this.[{ "test": "a", "world": "b", "time": [{ "start": "b", "end": "99" }] }] — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

rcongiu avatar Jan 05 '17 16:01 rcongiu

I have the same issue different error

I reproduced the issue even for the json @boje provided with @rcongiu declaration; [{ "test": "a", "world": "b", "time": [{ "start": "b", "end": "99" }] }]

table I try to create is; create external table message( msg array<struct<test:string,world:string>> ) ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe'

I get this error;

Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Missing value at 1 [character 2 line 1]

Any help would be truly appreciated

eastTowest avatar Jan 17 '17 22:01 eastTowest

It should support top level arrays. Have you tried declaring the table as  create external table message( msg structtest:string,world:string ) ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe'

? ------------------------------------------------------- "Good judgment comes from experience. Experience comes from bad judgment"

On Tuesday, January 17, 2017 2:03 PM, Bilal Nemutlu <[email protected]> wrote:

I have the same issue different errorI reproduced the issue even for the json @boje provided with @rcongiu declaration; [{ "test": "a", "world": "b", "time": [{ "start": "b", "end": "99" }] }]table I try to create is; create external table message( msg array<structtest:string,world:string> ) ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe'I get this error;Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Missing value at 1 [character 2 line 1]Any help would be truly appreciated— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

rcongiu avatar Jan 17 '17 22:01 rcongiu

I got the problem that you helped resolved here

However I see another issue, when i query the table, it returns the first object in that array

the file has the following one line json array [{"event":{"id":"1","type":"c"}},{"event":{"id":"2","type":"d"}}]

table declaration msg array<struct<event:struct<id:string, type:string>>> it returns only this; [{"event":{"id":"1","type":"c"}}]

i checked the count(*), it shows 1 row only

Thank you bunch for the help

eastTowest avatar Jan 17 '17 23:01 eastTowest

As far as i see, it considers the input json as just one row and it returns the first object within that! I also tried struct msg struct<event:struct<id:string, type:string>> but no luck.

changed the json structure as [{"id":"1","type":"c"},{"id":"2","type":"d"}]and table declaration msg struct<id:string, type:string> no luck again and just returns the first one {"id":"1","type":"c"}

I think i am very close to finish it up with your serde and I'd really appreciate your support!

eastTowest avatar Jan 17 '17 23:01 eastTowest

Hi eastTowest ,

Were you able to solve this, i am facing the same issue.

sushantkhanayat avatar Nov 07 '17 10:11 sushantkhanayat

Did anyone solve this issue. I am facing similar issue with top level array. can anyone suggest me a solution.

[{"user_id": "a"," previous_user_id": "b"},{"user_id": "c"," previous_user_id": "d"},{"user_id": "e"," previous_user_id": "f"}] create external table array_tmp (User array<struct<user_id: String, previous_user_id:String>>) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'

on select user.user_id from array_tmp it gives exception as Row is not a valid JSON Object

sindhujadhamodaran avatar Mar 08 '18 22:03 sindhujadhamodaran