Hive-JSON-Serde icon indicating copy to clipboard operation
Hive-JSON-Serde copied to clipboard

Can I process multiline JSON data in HIVE ??

Open debuggerrr opened this issue 7 years ago • 5 comments

I am able to create table and process data for single line JSON data but my question is can I process JSON data like below:

[{
	"field1": "data1",
	"field2": 100,
	"field3": "more data1",
	"field4": 123.001
}, {
	"field1": "data2",
	"field2": 200,
	"field3": "more data2",
	"field4": 123.002
}, {
	"field1": "data3",
	"field2": 300,
	"field3": "more data3",
	"field4": 123.003
}, {
	"field1": "data4",
	"field2": 400,
	"field3": "more data4",
	"field4": 123.004
}]

I have read that multiline JSON data wasn't supported in HIVE but can I use it now?? If yes, then please share the links where I can find this because I searched alot for this but I couldn't find any relevant material for this. Thanks in advance .

debuggerrr avatar Nov 12 '17 13:11 debuggerrr

Do you have an actual file sample ? SerDe won't work unless there's one JSON record per line. R.

"Good judgment comes from experience. Experience comes from bad judgment"

On Sunday, November 12, 2017, 5:06:32 AM PST, debuggerrr <[email protected]> wrote:  

I want to load below JSON data and for that I am trying to create table as below:

CREATE TABLE my_table (field1 string, field2 int, field3 string, field4 double) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' ;

I added below JAR files :

  • hive-json-serde.jar
  • hive-json-serde-0.2.jar
  • json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar
  • json-serde-1.3.jar
    but it gives me below error:
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Could not initialize class org.openx.data.jsonserde.objectinspector.JsonObjectInspectorFactory
    I want to load below data:
    [{
    "field1": "data1",
    "field2": 100,
    "field3": "more data1",
    "field4": 123.001
    }, {
    "field1": "data2",
    "field2": 200,
    "field3": "more data2",
    "field4": 123.002
    }, {
    "field1": "data3",
    "field2": 300,
    "field3": "more data3",
    "field4": 123.003
    }, {
    "field1": "data4",
    "field2": 400,
    "field3": "more data4",
    "field4": 123.004
    }]

I referred many links from stackoverflow as well as from github but of no use. Please help me with this. Let me know where I am going wrong.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

rcongiu avatar Nov 13 '17 04:11 rcongiu

I am able to create table and process data for single line JSON data but my question is can I process JSON data like below:

[{ "field1": "data1", "field2": 100, "field3": "more data1", "field4": 123.001 }, { "field1": "data2", "field2": 200, "field3": "more data2", "field4": 123.002 }, { "field1": "data3", "field2": 300, "field3": "more data3", "field4": 123.003 }, { "field1": "data4", "field2": 400, "field3": "more data4", "field4": 123.004 }] I have read that multiline JSON data wasn't supported in HIVE but can I use it now?? If yes, then please share the links where I can find this because I searched alot for this but I couldn't find any relevant material for this. Thanks in advance .

debuggerrr avatar Nov 13 '17 10:11 debuggerrr

Can i have an answer for this ? @rcongiu

debuggerrr avatar Nov 14 '17 17:11 debuggerrr

Has someone an answer to this question? I have the same issue. thanks

vincenzocapel avatar Oct 20 '18 19:10 vincenzocapel

Since a nomal json should start wih '{' and end with '}', maybe you can remove the '[' and ']' and then set row format delimited fields terminated by ','

fayedd avatar Oct 23 '18 07:10 fayedd