spark-hbase-connector icon indicating copy to clipboard operation
spark-hbase-connector copied to clipboard

Not able to write to an hbase table

Open surrey-kapkoti opened this issue 8 years ago • 1 comments

I have and EMR cluster on which spark is running , and another EMR cluster on which hbase is running , I have created a table named 'TableForSpark' on it and I'm trying to write data to it using the following code:

import it.nerdammer.spark.hbase._ import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.SparkContext._ //import org.apache.spark.sql.execution.datasources.hbase._ object hbaseTest { def main( args: Array[String] ){ val conf = new SparkConf().setAppName("Hbase test") //conf.set("spark.hbase.host", "192.168.0.23") val sc = new SparkContext(conf)

val rdd = sc.parallelize(1 to 10).map(i => (i.toString, i+1, "Hello"))

val rdd1 = rdd.toHBaseTable("TableForSpark").toColumns("column1", "column1").inColumnFamily("cf")
rdd1.save()

} }

I have built 'spark-hbase-connector' using scala 2.11.8 on spark 2.0.0.

When I submit the job using the following command , it gets stuck up in the last stage: sudo spark-submit --deploy-mode client --jars $(echo lib/*.jar | tr ' ' ',') --class com.oreilly.learningsparkexamples.hbaseTest target/scala-2.11/hbase-test_2.11-0.0.1.jar

I have also kept hbase-site.xml file in the resource folder and the program is correctly picking up the zookeeper ip from it.

I have checked the logs of the task , it is able to connect to the zookeeper but not able to write to hbase, Could any throw some light on the problem.

The last part of the log looks like this:

16/08/18 11:48:35 INFO YarnClientSchedulerBackend: Application application_1470825934412_0088 has started running. 16/08/18 11:48:35 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46496. 16/08/18 11:48:35 INFO NettyBlockTransferService: Server created on 10.60.0.xxx:46496 16/08/18 11:48:35 INFO BlockManager: external shuffle service port = 7337 16/08/18 11:48:35 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.60.0.13, 46496) 16/08/18 11:48:35 INFO BlockManagerMasterEndpoint: Registering block manager 10.60.0.xxx:46496 with 414.4 MB RAM, BlockManagerId(driver, 10.60.0.xxx, 46496) 16/08/18 11:48:35 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.60.0.13, 46496) 16/08/18 11:48:36 INFO EventLoggingListener: Logging events to hdfs:///var/log/spark/apps/application_1470825934412_0088 16/08/18 11:48:36 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 16/08/18 11:48:36 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 16/08/18 11:48:36 INFO SparkContext: Starting job: saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102 16/08/18 11:48:36 INFO DAGScheduler: Got job 0 (saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102) with 2 output partitions 16/08/18 11:48:36 INFO DAGScheduler: Final stage: ResultStage 0 (saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102) 16/08/18 11:48:36 INFO DAGScheduler: Parents of final stage: List() 16/08/18 11:48:36 INFO DAGScheduler: Missing parents: List() 16/08/18 11:48:36 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at map at HBaseWriterBuilder.scala:66), which has no missing parents 16/08/18 11:48:37 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 89.1 KB, free 414.4 MB) 16/08/18 11:48:37 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.2 KB, free 414.3 MB) 16/08/18 11:48:37 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.60.0.13:46496 (size: 33.2 KB, free: 414.4 MB) 16/08/18 11:48:37 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012 16/08/18 11:48:37 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at map at HBaseWriterBuilder.scala:66) 16/08/18 11:48:37 INFO YarnScheduler: Adding task set 0.0 with 2 tasks 16/08/18 11:48:37 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1) 16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.60.0.134:53842) with ID 1 16/08/18 11:48:42 INFO ExecutorAllocationManager: New executor 1 has registered (new total is 1) 16/08/18 11:48:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-10-60-0-xxx.ec2.internal, partition 0, PROCESS_LOCAL, 5427 bytes) 16/08/18 11:48:42 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ip-10-60-0-xxx.ec2.internal, partition 1, PROCESS_LOCAL, 5484 bytes) 16/08/18 11:48:42 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-60-0-xxx.ec2.internal:34581 with 2.8 GB RAM, BlockManagerId(1, ip-10-60-0-134.ec2.internal, 34581) 16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 0 on executor id: 1 hostname: ip-10-60-0-xxx.ec2.internal. 16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 1 on executor id: 1 hostname: ip-10-60-0-xxx.ec2.internal. 16/08/18 11:48:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-10-60-0-xxx.ec2.internal:34581 (size: 33.2 KB, free: 2.8 GB)

It gets stuck up at this point.

Thanks & Regards, Surender.

surrey-kapkoti avatar Aug 18 '16 12:08 surrey-kapkoti

@surrey-kapkoti any solution on that?

fbbergamo avatar Nov 23 '17 18:11 fbbergamo