elephant-bird icon indicating copy to clipboard operation
elephant-bird copied to clipboard

Fix protobuf serde errors

Open angushe opened this issue 9 years ago • 12 comments

Hi,

This is a pull request trying to fix the same problem described in pull request #400, and the fix has been tested successfully on Hive 0.12/0.13 and Protobuf 2.4.1/2.5.0.

Any comments?

Thanks Angus

angushe avatar Dec 04 '14 15:12 angushe

I used this patch in cdh5.1.2 with Hive 0.12.0-cdh5.1.0 confirmed that he bug in Issue#400 was resolved! this patch look good to me.

miltonwulei avatar Feb 02 '15 09:02 miltonwulei

I used this with Hive 0.13.1-cdh5.3.1 and Protobuf 2.5.0 in order to resolve Issue #400. Any chance of this getting merged soon? Thanks for this patch angushe!

cooper6581 avatar Feb 27 '15 00:02 cooper6581

Solves #400 for me on Hive 0.13.1 and Protobuf 2.5.0 on AWS AMI 3.3.1. Great fix, will it get merged soon?

harelglik avatar Mar 03 '15 11:03 harelglik

Used to resolve issue #400 with Protobuf 2.5.0, Hive 0.14.0 & HDP 2.2. Thanks

alastrange avatar May 19 '15 15:05 alastrange

@rangadi I don't know too much about protobuf dynamic messages, would you mind giving this a look too?

There's a lot of casting + isntanceof going on in here where there previously wasn't -- is that part of the direct fix for the issue, or are these just the only way to use DynamicMessage?

isnotinvain avatar May 20 '15 02:05 isnotinvain

We haven't used Hive serde's actively. I will take look anyway.

rangadi avatar May 20 '15 19:05 rangadi

The fix looks good. I am not sure about Alex's comment on hashCode(). I just have one comment: if we ever expect Message object.

rangadi avatar May 20 '15 19:05 rangadi

Let's assume this patch will never be merged. In this case, I would like to optimize this pull request's SEO.

I was seeing issues like this when running Hive queries on Protobuf external tables requiring a MapReduce job. These issues would not present on queries like:

SELECT * FROM protobuf_external_table LIMIT 1;

But when running a query like this:

SELECT DISTINCT(field.subfield) FROM protobuf_external_table;

I would get a traceback:

17/02/14 10:09:37 [LocalJobRunner Map Task Executor #0]: ERROR mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable <LOTS OF BYTES>
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: FieldDescriptor does not match message type.
	at com.google.protobuf.GeneratedMessage$FieldAccessorTable.getField(GeneratedMessage.java:1536)
	at com.google.protobuf.GeneratedMessage$FieldAccessorTable.access$100(GeneratedMessage.java:1449)
	at com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:366)
	at com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:228)
	at io.arbor.elephantbird.ProtobufStructObjectInspector.setStructFieldData(ProtobufStructObjectInspector.java:148)
	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:407)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:129)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:92)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:488)
	... 10 more

Applying both revisions of this PR fixed the issue conclusively.

joshk0 avatar Feb 14 '17 15:02 joshk0

Really thanks for this and please merge it asap. I took the patch and it works like a charm now.

sugix avatar Mar 03 '17 19:03 sugix

Looks like way back when we had some questions on this PR that didn't get answered. Anyone interested in taking a look? I think we can merge this if someone wants to verify it's still working + address the review feedback?

isnotinvain avatar Mar 03 '17 21:03 isnotinvain

I am using elephant-bird-hive-4.15.jar but still I am getting the same issue why???

agammishra avatar Mar 30 '18 14:03 agammishra

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jul 18 '19 15:07 CLAassistant