Problems with loading Data to Avro Backed table
ISSUE 1 :
- I have data serialized to a file using JAVA and Avro APIs.
- Created a partitioned table using same schema and Hive using Haivvreo
- Copied file from 1 to HDFS
- Registered the partition with the table
- Tried loading data to table using
hive> use serdetestdb; load data inpath '/user/immilind/Employee3.ser' into table employee_table partition (schema_def='Employee3',gen_time='2012110684533',arr_time='20121106090422'); OK Time taken: 0.763 seconds Loading data to table serdetestdb.employee_table partition (schema_def=Employee3, gen_time=2012110684533, arr_time=20121106090422) OK Time taken: 1.31 seconds
But Select query does not find any data.
hive> use serdetestdb; select * from employee_table; OK Time taken: 0.016 seconds OK Time taken: 0.444 seconds
The file Employee3.ser is copied to registered partition.
What is it that I am missing ?
ISSUE 2:
Moreover I am using pig to load data from table.
register /homes/immilind/haivvreo-1.0.12-avro15-hive81-SNAPSHOT.jar; eventData = load 'serdetestdb.employee_table' using org.apache.hcatalog.pig.HCatLoader(); actualData = filter eventData by schema_def == 'Employee3' and gen_time=='2012110684533' and arr_time=='20121106090422'; dump actualData;
Though the jar file has com.linkedin.haivvreo.AvroContainerInputFormat is thorws class not found error
Haivvreo + HCat isn't supported. HCat has some problems. I'm planning on adding support for this via the Avro Serde I moved to Hive, not necessarily through Haivvreo.
Well what abt Issue 1 ?
Table Creation Script :
CREATE EXTERNAL TABLE employee
PARTITIONED BY (schema_def string, gen_time string, arr_time string)
ROW FORMAT SERDE 'com.linkedin.haivvreo.AvroSerDe'
WITH SERDEPROPERTIES ( 'schema-literal' = '{ "type" : "record", "name" : "employee3", "fields":[ {"name" : "name", "type" : "string", "default" : "NU"}, {"name" : "age", "type" : "int", "default" : 0 }, {"name" : "dept", "type": "string", "default" : "DU"} ] }' )
STORED AS INPUTFORMAT 'com.linkedin.haivvreo.AvroContainerInputFormat' OUTPUTFORMAT 'com.linkedin.haivvreo.AvroContainerOutputFormat'
Schema used to serialize data: { "type" : "record", "name" : "employee3", "fields":[ {"name" : "name", "type" : "string", "default" : "NU"}, {"name" : "age", "type" : "int", "default" : 0 }, {"name" : "dept", "type": "string", "default" : "DU"} ] }