hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] HBase connection closed exception

Open xicm opened this issue 3 years ago • 3 comments

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

I described in https://issues.apache.org/jira/browse/HUDI-3983.

I get a connection closed exception with HBase index. We use relocation in spark bundle, when I remove the relocations, the job succeed.

I have been debugging the differences between with relocation and without relocation for a long time, but found nothing.

To Reproduce

Steps to reproduce the behavior:

As we use relocation in spark bundle, this conf will cause ClassNotFoundException, comment the listener class in in hudi-common/src/main/resources/hbase-site.xml.

  <property>
    <name>hbase.status.listener.class</name>
    <value>org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener</value>
    <description>
      Implementation of the status listener with a multicast message.
    </description>
  </property>

Add this conf we can get the error message quickly.

<property>
    <name>hbase.client.retries.number</name>
    <value>0</value>
 </property>

  1. Add org.apache.hbase.thirdparty:hbase-shaded-gson in packaging/hudi-spark-bundle/pom.xml
  2. mvn clean package -DskipTests -Dspark3.1 -Dscala-2.12 -Dhadoop.version=3.3.0 -Dhive.version=3.1.2
  3. write data with hbase index.
    df.write.format("org.apache.hudi").
      options(getQuickstartWriteConfigs).
      option(PRECOMBINE_FIELD.key, "ts").
      option(RECORDKEY_FIELD.key, "uuid").
      option(PARTITIONPATH_FIELD.key, "partitionpath").
      option(TBL_NAME.key, tableName).
      option(TABLENAME.key(), tableName).
      option(INDEX_TYPE.key, "HBASE").
      option(ZKQUORUM.key, "${hbase.zookeeper.quorum}").
      option(ZKPORT.key, "2181").
      option(ZK_NODE_PATH.key, "${zooKeeper.znode.parent }").
      option("hoodie.metadata.index.column.stats.enable", "true").
      option("hoodie.embed.timeline.server", "false").
      mode(Overwrite).
      save(tablePath)

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version :

  • Spark version : 3.1.1

  • Hive version : 3.1.2

  • Hadoop version : 3.3.0

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=1, exceptions:
2022-08-26T07:12:57.603Z, RpcRetryingCaller{globalStartTime=2022-08-26T07:12:56.651Z, pause=100, maxAttempts=1}, org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Call to address=x.x.x.x/x.x.x.x:16020 failed on local exception: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection closed    at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:146)
    at org.apache.hudi.org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Call to address=x.x.x.x/x.x.x.x:16020 failed on local exception: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection closed
    at org.apache.hudi.org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:214)
    at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:384)
    at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
    at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:415)
    at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:411)
    at org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:118)
    at org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.setException(Call.java:133)
    at org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.cleanupCalls(NettyRpcDuplexHandler.java:203)
    at org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelInactive(NettyRpcDuplexHandler.java:211)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:389)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:354)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:831)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    ... 1 more
Caused by: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection closed
    ... 26 more

xicm avatar Aug 26 '22 08:08 xicm

@xicm we have to shade the HBase classes to be compatible with Hive query engine which introduces HBase classes as well. Does changing all relevant class names with shading pattern in hudi-common/src/main/resources/hbase-site.xml work for you?

yihua avatar Aug 26 '22 23:08 yihua

@xicm we have to shade the HBase classes to be compatible with Hive query engine which introduces HBase classes as well. Does changing all relevant class names with shading pattern in hudi-common/src/main/resources/hbase-site.xml work for you?

Thanks @yihua, I rename the relevant class names with shaded name in hbase-site.xml, and get the same exception.

xicm avatar Aug 29 '22 09:08 xicm

This page https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-considerations.html provides a way to work around, but the problem in spark bundle still exists.

xicm avatar Sep 19 '22 07:09 xicm

@yihua : similar to our FAQ on metadata usage, can we add a FAQ on this end.

nsivabalan avatar Oct 23 '22 04:10 nsivabalan