hive-solr icon indicating copy to clipboard operation
hive-solr copied to clipboard

Error loading into solr table from another hive table.

Open aftnix opened this issue 8 years ago • 6 comments

>sudo -u solr bin/solr create -c hiveCollection -d basic_configs -n hiveCollection -s 2 -rf 2
>hive>CREATE EXTERNAL TABLE authproc_syslog_solr (hid STRING, tstamp TIMESTAMP, type STRING, msg STRING, thost STRING, tservice STRING, tyear STRING, tmonth STRING, tday STRING) STORED BY 'com.lucidworks.hadoop.hive.LWStorageHandler' LOCATION '/tmp/solr' TBLPROPERTIES('solr.zkhost' = 'hadoop1.openstacksetup.com:2181/solr', 'solr.collection'='hiveCollection', 'solr.query' = '*:*');

>hive>INSERT OVERWRITE TABLE authproc_syslog_solr SELECT s.* FROM authproc_syslog s;

Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:32, Vertex vertex_1473357519389_0194_6_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_14733575
19389_0194_6_00, diagnostics=[Task failed, taskId=task_1473357519389_0194_6_00_000009, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure wh
ile running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whi
le processing row
Caused by: java.lang.NullPointerException
        at com.lucidworks.hadoop.io.impl.LWSolrDocument.getId(LWSolrDocument.java:46)
        at com.lucidworks.hadoop.io.LucidWorksWriter.write(LucidWorksWriter.java:184)
        at com.lucidworks.hadoop.hive.LWHiveOutputFormat$1.write(LWHiveOutputFormat.java:39)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:764)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:102)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)

The hive table and the hive_solr table have the exactly same schema.

aftnix avatar Sep 25 '16 11:09 aftnix

Turns out my table didn't have id as the first field. I fixed it. But now the INSERT never finishes( I waited couple of hours, reduced dataset etc, but the query never finishes.

Yarn logs contain these :

2016-09-26 17:24:21,082 [INFO] [Dispatcher thread {Central}] |history.HistoryEventHandler|: [HISTORY][DAG:dag_1474881768573_0002_2][Event:VERTEX_
FINISHED]: vertexName=Map 1, vertexId=vertex_1474881768573_0002_2_00, initRequestedTime=1474883498501, initedTime=1474883499061, startRequestedTi
me=1474883498577, startedTime=1474883499061, finishTime=1474889061031, timeTaken=5561970, status=KILLED, diagnostics=Vertex received Kill while i
n RUNNING state.
Vertex did not succeed due to DAG_KILL, failedTasks:0 killedTasks:3
Vertex vertex_1474881768573_0002_2_00 [Map 1] killed/failed due to:DAG_KILL, counters=Counters: 0, vertexStats=firstTaskStartTime=1474883503313, 
firstTasksToStart=[ task_1474881768573_0002_2_00_000001 ], lastTaskFinishTime=1474889061030, lastTasksToFinish=[ task_1474881768573_0002_2_00_000
002,task_1474881768573_0002_2_00_000001 ], minTaskDuration=-1, maxTaskDuration=-1, avgTaskDuration=-1.0, numSuccessfulTasks=0, shortestDurationTa
sks=[  ], longestDurationTasks=[  ], vertexTaskStats={numFailedTaskAttempts=0, numKilledTaskAttempts=0, numCompletedTasks=3, numSucceededTasks=0,
 numKilledTasks=3, numFailedTasks=0}

Don't know what's going wrong here :(

aftnix avatar Sep 26 '16 09:09 aftnix

Sorry for the delay of a few days to get back to you.

Are there any errors besides those messages?

Can you also share a little bit about your environment - it seems you're using Tez? What version/distro of Hive?

ctargett avatar Sep 30 '16 20:09 ctargett

I am able to load data into solr external table from another managed hive table. But when I try to retrieve data from the solr table, it is throwing "Failed with exception java.io.IOException:java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String" I am using solr-hive-serde-2.2.6.jar on Hive 1.1.0-cdh5.4.5

vishnucg avatar Nov 04 '16 17:11 vishnucg

@vishnucg can you please open a new issue with your question?

acesar avatar Nov 05 '16 00:11 acesar

Did this issue get resolved? I'm getting the same error

shazack avatar Mar 02 '17 19:03 shazack

I'm getting Caused by: java.lang.NullPointerException at com.lucidworks.hadoop.io.impl.LWSolrDocument.getId(LWSolrDocument.java:46) at com.lucidworks.hadoop.io.LucidWorksWriter.write(LucidWorksWriter.java:190) ... 22 more ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":null,"_col1":null,"_col2":null,"_col3":null,"_col4":null,"_col5":null,"_col6":null,"_col7":null,"_col8":null,"_col9":null,"_col10":null,"_col11":null,"_col12":null,"_col13":null,"_col14":null,"_col15":null,"_col16

shazack avatar Mar 02 '17 19:03 shazack