shc
shc copied to clipboard
NullPointerException during connection creation.
I am hitting an issue while submitting an example with yarn-cluster deploy mode.
16/07/21 11:08:55 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, cdh52.vm.com): java.lang.NullPointerException at org.apache.hadoop.hbase.security.UserProvider.instantiate(UserProvider.java:43) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:214) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119) at org.apache.spark.sql.execution.datasources.hbase.TableResource.init(HBaseResources.scala:126) at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.liftedTree1$1(HBaseResources.scala:57) at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.acquire(HBaseResources.scala:54) at org.apache.spark.sql.execution.datasources.hbase.TableResource.acquire(HBaseResources.scala:121) at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.releaseOnException(HBaseResources.scala:74) at org.apache.spark.sql.execution.datasources.hbase.TableResource.releaseOnException(HBaseResources.scala:121) at org.apache.spark.sql.execution.datasources.hbase.TableResource.getScanner(HBaseResources.scala:145) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anonfun$9.apply(HBaseTableScan.scala:277) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anonfun$9.apply(HBaseTableScan.scala:276) at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56) at scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650) at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165) at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514) at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
I have the Hbase-site.xml in the classpath and present in the spark-conf dir too.
Did you hit it in yarn-client mode? If not, please try with --files hbase-site.xml in your spark submit script.
yarn-client mode works fine. I had the test run in yarn-cluster mode. I guess, the addCreds() as done in hbase-spark for the same implementation should fix it. Any comments?
Try putting your hbase-site.xml in the root of your jar ( i.e. src/main/resources/hbase-site.xml )
I am hitting this issue in yarn-client mode, but only for reading from HBase (write works). I've tried hbase-site.xml in the root of the jar and in the driver classpath.
Facing same exception for hbase reads in yarn-client mode, but write works. Passing hbase-site.xml in both --files & SPARK_CLASSPATH. Also setting HADOOP_CONF_DIR=/etc/hbase/conf.
@AbhiMadav as you are able to read, could you please share parameters & any env exports you are doing?
Exception : java.lang.NullPointerException at org.apache.hadoop.hbase.security.UserProvider.instantiate(UserProvider.java:122) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:214) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
UPDATE : Read & write both works in spark local but read fails yarn-client mode.
Sorry for the late reply. Had been busy lately. I was able to get it working for yarn-client mode (read/write). hbase-site.xml has to be on the classpath if you have a property that overrides the hbase-default.xml.
@sudhirpatil HADOOP_CONF_DIR should point to a dir location where it could find all the *-site.xml's and not just hbase-site.xml. You could also create a symlink for hbase-site.xml in the /etc/hadoop/conf dir and make it HADOOP_CONF_DIR. If you are still running into issue, could you share your spark-submit command?
same issue, yarn-client mode cannot read, turns out "def hbaseConf=wrappedConf.value.value" cannot be transfered to workers somhow, any suggestion? currently I directly created conf in workers as a workaround.
For mine it was first working in Yarn-Cluster mode. Suddenly it started producing this Exception. Any idea why?
Ok when I removed the Kryo serialization it started working as normal.
Isn't there a way to use it with Kryo. Because JavaSerializer is so slow and sometimes unable to serialize somethings.
I am trying to read & write data using Spark. I have already add all the site-* into the classpath & hbase JARs and conf in the files location. Read works fine but write gettin this exception: Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.hbase.security.UserProvider.instantiate(UserProvider.java:123) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:214) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119) at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.checkOutputSpecs(TableOutputFormat.java:177) at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.assertConf(SparkHadoopWriter.scala:387) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:71) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081) at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopDataset(JavaPairRDD.scala:831) at com.voicebase.etl.s3tohbase.HbaseScan2.main(HbaseScan2.java:148) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
@alchemistsrivastava Hi~ Did you fix it?
Oh , yeah~~ I have solved it~~~
@webtest444 How did you solve it? I'm facing the same issue as @alchemistsrivastava has faced. Read woks fine but write is throwing the exception. I'm using IntelliJ and Stand alone HBase
@webtest444 How did you solve it? I'm facing the same issue as @alchemistsrivastava has faced. Read woks fine but write is throwing the exception. I'm using IntelliJ and Stand alone HBase
Hi @swarup5s, I know that it's been a while since your comment, but I'm responding just in case the problem still exists.
I had the same issue and I fixed it by this post: https://stackoverflow.com/questions/50925942/getting-null-pointer-exception-when-running-saveasnewapihadoopdataset-in-scala-s Not sure if it's a formal way to fix, but it worked for me. I'm using spark 2.4 with hbase pseudo-cluster mode on hadoop pseudo-cluster.
Hope this helps.
I found that this is a bug of hbase-server. You can solve it by upgrading the hbase-server version to 2.0+. Of course, you can also add spark conf spark.hadoop.validateOutputSpecs=false to solve this problem