Encountered ArrayIndexOutOfBoundsException when running in Apache Spark 3.3.1
Describe the bug A clear and concise description of what the bug is.
[13:42:25:880] [Executor task launch worker for task 16912.3 in stage 27.0 (TID 43832)] ERROR org.apache.spark.executor.Executor.logError:98 - Exception in task 16912.3 in stage 27.0 (TID 43832) com.esotericsoftware.kryo.KryoException: java.lang.ArrayIndexOutOfBoundsException: -2 Serialization trace: numeric (org.apache.spark.sql.types.DecimalType) dataType (org.apache.spark.sql.types.StructField) fields (org.apache.spark.sql.types.StructType) schema (org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) ~[kryo-shaded-4.0.2.jar:?] at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:306) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:168) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) ~[spark-core_2.12-3.3.1.jar:3.3.1] at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32) ~[scala-library-2.12.15.jar:?] at org.apache.livy.thriftserver.session.RDDStreamIterator.lambda$collectPartitionSize$fb554325$1(RDDStreamIterator.java:87) ~[livy-thriftserver-session-0.8.0-incubating.jar:0.8.0-incubating] at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitions$1(JavaRDDLike.scala:153) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:857) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:857) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.scheduler.Task.run(Task.scala:136) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1556) ~[spark-core_2.12-3.3.1.jar:3.3.1] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) ~[spark-core_2.12-3.3.1.jar:3.3.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] Caused by: java.lang.ArrayIndexOutOfBoundsException: -2 at java.util.ArrayList.elementData(ArrayList.java:424) ~[?:1.8.0_352] at java.util.ArrayList.get(ArrayList.java:437) ~[?:1.8.0_352] at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:729) ~[kryo-shaded-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) ~[kryo-shaded-4.0.2.jar:?] ... 34 more
To Reproduce Provide a minimal reproducible example of the problem, ideally in the form of a runnable test-case.
Environment:
- OS: centos
- JDK Version: 8
- Kryo Version: 4.0.2
Additional context Add any other context about the problem here. Running Apache Spark with kryo version 4.0.2 It this a bug? Is it fixed in the latest version?
Please provide a minimal reproducer for this issue. It's not possible to diagnose the problem from the stacktrace alone.
Can't reproduce it. Why do we allow ArrayIndexOutOfBoundsException to happen in kryo's code? Should we do something in advance when the value of id is illegal(like -2) in Kryo.readReferenceOrNull? @theigl
There is not much we can do. The data seems to have been corrupted somehow and cannot be read.
Kryo could only throw a different exception and this is what we do in Kryo 5: https://github.com/EsotericSoftware/kryo/blob/47d90673daac32af7152d76eef2e1261852d2a58/src/com/esotericsoftware/kryo/Kryo.java#L925