shc icon indicating copy to clipboard operation
shc copied to clipboard

org.apache.hadoop.hbase.client.RetriesExhaustedException

Open QingyangKong opened this issue 7 years ago • 8 comments

I am trying to use shc in HDP. Spark version in the cluster is 1.5.2, command I run is: spark-submit --class org.apache.spark.sql.execution.datasources.hbase.examples.HBaseSource --master yarn-client --jars /usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar --files /usr/hdp/current/hbase-client/conf/hbase-site.xml, /usr/hdp/current/hbase-client/conf/hdfs-site.xml /path/to/hbase-spark-connector-1.0.0.jar

Exception:

Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Wed Aug 10 14:55:21 EDT 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68118: row 'table2,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname={hostName},16020,1470239773946, seqNum=0
        at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:271)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:195)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
        at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
        at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
        at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193)
        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isTableAvailable(ConnectionManager.java:991)
        at org.apache.hadoop.hbase.client.HBaseAdmin.isTableAvailable(HBaseAdmin.java:1400)
        at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95)
        at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:66)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:170)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:146)
        at org.apache.spark.sql.execution.datasources.hbase.examples.HBaseSource$.main(HBaseSource.scala:92)
        at org.apache.spark.sql.execution.datasources.hbase.examples.HBaseSource.main(HBaseSource.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68118: row 'table2,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname={hostName},16020,1470239773946, seqNum=0
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
        at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to {hostname}/{ip}:16020 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to {hostname}/{ip}:16020 is closing. Call id=9, waitTime=1
        at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1259)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:12e se30)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651)
        at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:372)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:199)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:346)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:320)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
        ... 4 more
Caused by: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to {hostName}/{ip}:16020 is closing. Call id=9, waitTime=1
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1047)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:846)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:574)

Cluster is using kerboros authentication. RegionServers are running well in the cluster and I can create or drop table through Hbase-shell. I guess it should be permission and configuration problem but cannot figure it out.

QingyangKong avatar Aug 10 '16 20:08 QingyangKong

Make sure the ZooKeeper hostnames are resolvable. That solved my problem when connecting to a remote cluster.

nicodv avatar Nov 22 '16 22:11 nicodv

@nicodv Nope, I don't think the problem is about Zookeeper hostname. Because my host name seem correct but not hope.

dalmatele avatar Nov 24 '16 08:11 dalmatele

Hello, did you solve this problem? I have also encountered this problem.

sevenliu5688 avatar Mar 31 '17 09:03 sevenliu5688

@lvtu5688 Sorry man, I forget it is very long time ago. From commands I use to submit, I guess the problem is I did not provide attributes such as "zookeeper.znode.parent", "hbase.zookeeper.quorum" and "hbase.master". Try to add this into hbase conf and add all jar in classpath using command hbase classpath.

QingyangKong avatar Mar 31 '17 15:03 QingyangKong

Hey, the same issue here, could you provide any info about how to solve this problem? I'm using spark 1.6 over CDH cluster. All "*.Zookeeper.*" properties exist in hbase-site.xml

zagorulkinde avatar May 24 '17 17:05 zagorulkinde

In DEBUG log of your task you can find next line: ipc.RpcClientImpl: Use SIMPLE authentication for service ClientService, sasl=false So, the reason is that hbase client do not use Kerberos (in fact, client does not know about auth method).

To resolve issue you just need to add hbase-site.xml to your classpath: --conf "spark.driver.extraClassPath=/etc/hbase/conf/"

NorsaG avatar Mar 15 '18 09:03 NorsaG

Hi Norsa I am getting the same issue in PROD. Will this solve my problem?

nitish001100 avatar Sep 06 '18 08:09 nitish001100

Yes, I wrote it in my comment

To resolve issue you just need to add hbase-site.xml to your classpath: --conf "spark.driver.extraClassPath=/etc/hbase/conf/"

NorsaG avatar Sep 06 '18 10:09 NorsaG