haivvreo icon indicating copy to clipboard operation
haivvreo copied to clipboard

Problems with loading Data to Avro Backed table

Open ImMilind opened this issue 13 years ago • 2 comments

ISSUE 1 :

  1. I have data serialized to a file using JAVA and Avro APIs.
  2. Created a partitioned table using same schema and Hive using Haivvreo
  3. Copied file from 1 to HDFS
  4. Registered the partition with the table
  5. Tried loading data to table using

hive> use serdetestdb; load data inpath '/user/immilind/Employee3.ser' into table employee_table partition (schema_def='Employee3',gen_time='2012110684533',arr_time='20121106090422'); OK Time taken: 0.763 seconds Loading data to table serdetestdb.employee_table partition (schema_def=Employee3, gen_time=2012110684533, arr_time=20121106090422) OK Time taken: 1.31 seconds

But Select query does not find any data.

hive> use serdetestdb; select * from employee_table; OK Time taken: 0.016 seconds OK Time taken: 0.444 seconds

The file Employee3.ser is copied to registered partition.

What is it that I am missing ?

ISSUE 2:

Moreover I am using pig to load data from table.

register /homes/immilind/haivvreo-1.0.12-avro15-hive81-SNAPSHOT.jar; eventData = load 'serdetestdb.employee_table' using org.apache.hcatalog.pig.HCatLoader(); actualData = filter eventData by schema_def == 'Employee3' and gen_time=='2012110684533' and arr_time=='20121106090422'; dump actualData;

Though the jar file has com.linkedin.haivvreo.AvroContainerInputFormat is thorws class not found error

ImMilind avatar Nov 07 '12 23:11 ImMilind

Haivvreo + HCat isn't supported. HCat has some problems. I'm planning on adding support for this via the Avro Serde I moved to Hive, not necessarily through Haivvreo.

jghoman avatar Nov 08 '12 00:11 jghoman

Well what abt Issue 1 ?

Table Creation Script :

CREATE EXTERNAL TABLE employee

PARTITIONED BY (schema_def string, gen_time string, arr_time string)

ROW FORMAT SERDE 'com.linkedin.haivvreo.AvroSerDe'

WITH SERDEPROPERTIES ( 'schema-literal' = '{ "type" : "record", "name" : "employee3", "fields":[ {"name" : "name", "type" : "string", "default" : "NU"}, {"name" : "age", "type" : "int", "default" : 0 }, {"name" : "dept", "type": "string", "default" : "DU"} ] }' )

STORED AS INPUTFORMAT 'com.linkedin.haivvreo.AvroContainerInputFormat' OUTPUTFORMAT 'com.linkedin.haivvreo.AvroContainerOutputFormat'

Schema used to serialize data: { "type" : "record", "name" : "employee3", "fields":[ {"name" : "name", "type" : "string", "default" : "NU"}, {"name" : "age", "type" : "int", "default" : 0 }, {"name" : "dept", "type": "string", "default" : "DU"} ] }

ImMilind avatar Nov 08 '12 15:11 ImMilind