hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT]IllegalStateException: Trying to access closed classloader

Open hbgstc123 opened this issue 2 years ago • 7 comments
trafficstars

Describe the problem you faced

flink job, stream read from hudi srouce and stream write to hudi sink. this error happen after run 4 hours, cause job to restart.

java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'.

To Reproduce

Steps to reproduce the behavior:

  1. flink job, stream read from hudi srouce and stream write to hudi sink
  2. error happen after my job run 4 hours, not sure it can reproduce

Environment Description

  • Hudi version : 0.12.1

  • Flink version : 1.15

  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'.
	at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
	at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:172)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2366)
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2331)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2427)
	at org.apache.hadoop.ipc.RPC.getProtocolEngine(RPC.java:209)
	at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:607)
	at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:573)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:546)
	at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.createClientDatanodeProtocolProxy(ClientDatanodeProtocolTranslatorPB.java:187)
	at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.createClientDatanodeProtocolProxy(ClientDatanodeProtocolTranslatorPB.java:178)
	at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.(ClientDatanodeProtocolTranslatorPB.java:127)
	at org.apache.hadoop.hdfs.DFSUtilClient.createClientDatanodeProtocolProxy(DFSUtilClient.java:603)
	at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:323)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:296)
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:227)
	at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:211)
	at org.apache.hadoop.hdfs.DFSClient.openInternal(DFSClient.java:1146)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1132)
	at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:351)
	at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:347)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:360)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:919)
	at org.apache.hudi.common.fs.HoodieWrapperFileSystem.open(HoodieWrapperFileSystem.java:468)
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:754)
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:305)
	at org.apache.hudi.common.table.timeline.HoodieDefaultTimeline.getInstantDetails(HoodieDefaultTimeline.java:397)
	at org.apache.hudi.hadoop.utils.HoodieInputFormatUtils.getCommitMetadata(HoodieInputFormatUtils.java:517)
	at org.apache.hudi.sink.partitioner.profile.WriteProfiles.getCommitMetadata(WriteProfiles.java:236)
	at org.apache.hudi.source.IncrementalInputSplits.lambda$inputSplits$2(IncrementalInputSplits.java:285)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
	at org.apache.hudi.source.IncrementalInputSplits.inputSplits(IncrementalInputSplits.java:285)
	at org.apache.hudi.source.StreamReadMonitoringFunction.monitorDirAndForwardSplits(StreamReadMonitoringFunction.java:199)
	at org.apache.hudi.source.StreamReadMonitoringFunction.run(StreamReadMonitoringFunction.java:172)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:128)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:73)
	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333)

hbgstc123 avatar Dec 22 '22 09:12 hbgstc123

@hbgstc123 Thanks for raising the issue. @danny0405 could you provide help here?

yihua avatar Jan 04 '23 00:01 yihua

@hbgstc123 does this happen every few hours or it only happened once so far? can you try upgrading to 0.12.2 and see how it goes?

xushiyan avatar Jan 07 '23 16:01 xushiyan

One suggestion is not to use the session cluster, the session cluster mode is fragile for classloader.

danny0405 avatar Jan 09 '23 03:01 danny0405

@hbgstc123 does this happen every few hours or it only happened once so far? can you try upgrading to 0.12.2 and see how it goes?

happen every few hours, but after we set this config classloader.check-leaked-classloader = false, it stop to happen

hbgstc123 avatar Jan 10 '23 10:01 hbgstc123

One suggestion is not to use the session cluster, the session cluster mode is fragile for classloader.

we are using application mode

hbgstc123 avatar Jan 10 '23 10:01 hbgstc123

@hbgstc123 does this happen every few hours or it only happened once so far? can you try upgrading to 0.12.2 and see how it goes?

happen every few hours, but after we set this config classloader.check-leaked-classloader = false, it stop to happen

Thanks, seems there are some leak for classloaders, did you use the mor table with async compaction enabled ?

danny0405 avatar Jan 11 '23 07:01 danny0405

you can set classloader.check-leaked-classloader: "false" in flink.conf

Dwrite avatar Sep 20 '24 09:09 Dwrite