incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Flaky Test] RepartitionWithHadoopHybridStorageRssTest#resultCompareTest

Open xianjingfeng opened this issue 1 year ago • 0 comments

Code of Conduct

Search before asking

  • [X] I have searched in the issues and found no similar issues.

Describe the flaky test

Error:  resultCompareTest  Time elapsed: 24.11 s  <<< ERROR!
org.apache.spark.SparkException: 
Job aborted due to stage failure: Task 3 in stage 2.0 failed 1 times, most recent failure: Lost task 3.0 in stage 2.0 (TID 12) (fv-az694-448.54qhn1hql4lezjuaogcugbdfcb.ex.internal.cloudapp.net executor driver): org.apache.uniffle.common.exception.RssFetchFailedException: Failed to read shuffle data from HOT handler
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:124)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.read(ShuffleReadClientImpl.java:273)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.readShuffleBlockData(ShuffleReadClientImpl.java:185)
	at org.apache.spark.shuffle.reader.RssShuffleDataIterator.hasNext(RssShuffleDataIterator.java:115)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
	at org.apache.spark.shuffle.reader.RssShuffleReader$MultiPartitionIterator.hasNext(RssShuffleReader.java:297)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:155)
	at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
	at org.apache.spark.shuffle.reader.RssShuffleReader.read(RssShuffleReader.java:136)
	at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:106)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
	at io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1454)
	at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1383)
	at io.netty.buffer.PooledByteBuf.duplicateInternalNioBuffer(PooledByteBuf.java:194)
	at io.netty.buffer.PooledByteBuf.nioBuffer(PooledByteBuf.java:211)
	at io.netty.buffer.AbstractByteBuf.nioBuffer(AbstractByteBuf.java:1231)
	at org.apache.uniffle.common.netty.buffer.NettyManagedBuffer.nioByteBuffer(NettyManagedBuffer.java:48)
	at org.apache.uniffle.common.ShuffleIndexResult.getIndexData(ShuffleIndexResult.java:62)
	at org.apache.uniffle.common.segment.LocalOrderSegmentSplitter.split(LocalOrderSegmentSplitter.java:67)
	at org.apache.uniffle.storage.handler.impl.DataSkippableReadHandler.readShuffleData(DataSkippableReadHandler.java:80)
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:113)
	... 25 more


Actions URL

https://github.com/xianjingfeng/incubator-uniffle/actions/runs/7708416244/job/21007443535

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

xianjingfeng avatar Jan 30 '24 09:01 xianjingfeng