hive-solr
hive-solr copied to clipboard
Solr Index on Hive - Loading data into External table fails
Hello, I created a Solr Index on a Hive table with below steps. When I try to load rows from the Hive Internal table to the Hive External table, it fails. Pls help.
-
CREATE TABLE ER_ENTITY1000(entityid INT,claimid_s INT,firstname_s STRING,lastname_s STRING,addrline1_s STRING, addrline2_s STRING, city_s STRING, state_S STRING, country_s STRING, zipcode_s STRING, dob_s STRING, ssn_s STRING, dl_num_s STRING, proflic_s STRING, policynum_s STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
-
LOAD DATA LOCAL INPATH '/home/Solr1.csv' OVERWRITE INTO TABLE ER_ENTITY1;
-
add jar /home/solr-hive-serde-3.0.0.jar;
CREATE EXTERNAL TABLE SOLR_ENTITY999(entityid INT,claimid_s INT,firstname_s STRING,lastname_s STRING,ssn_s STRING,dl_num_s STRING,city_s STRING,state_s STRING,country_s STRING,zipcode_s STRING) > STORED BY 'com.lucidworks.hadoop.hive.LWStorageHandler' > LOCATION '/user/i98779/SOLR_ENTITY1' > TBLPROPERTIES('solr.server.url' = 'http://10.52.192.108:8983/solr','solr.collection' = 'er_entity','solr.query' = ':');
********** All above steps work fine **********
- ********** This step fails ********** INSERT OVERWRITE TABLE SOLR_ENTITY999 SELECT * FROM ER_ENTITY1000;
... With error: hive> INSERT OVERWRITE TABLE SOLR_ENTITY999 SELECT * FROM ER_ENTITY1000; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = i98779_20180308085142_3918b9ea-2158-4b0e-865f-2fcdefc17e4b Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) 2018-03-08 08:51:45,993 Stage-1 map = 0%, reduce = 0% Ended Job = job_local1283927429_0001 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: MAPRFS Read: 0 MAPRFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec
********** ERROR FROM HIVE JOB LOG is as below ********** java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.
@gseshwar Not sure if it is a typo but the second step you have a different table:
LOAD DATA LOCAL INPATH '/home/Solr1.csv' OVERWRITE INTO TABLE ER_ENTITY1;
ER_ENTITY1 -> ER_ENTITY11000
this log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
you need to check the yarn logs.