bug
bug copied to clipboard
Collection deserialization fails with cyclic object graph
Scala collections use a serialization proxy, which can leak during deserialization of a cyclic object graph.
Utility:
object SD {
import java.io._, scala.util.chaining._
def serialize(obj: AnyRef) = new ByteArrayOutputStream().tap(b => new ObjectOutputStream(b).writeObject(obj)).toByteArray
def deserialize(a: Array[Byte]) = new ObjectInputStream(new ByteArrayInputStream(a)).readObject()
def serializeDeserialize[T <: AnyRef](obj: T) = deserialize(serialize(obj)).asInstanceOf[T]
}
Test code:
@Test def coll(): Unit = {
val b = ListBuffer[AnyRef]()
val bar = new Bar(b)
b += bar
SD.serializeDeserialize(b)
}
This fails with
java.lang.ClassCastException:
cannot assign instance of scala.collection.generic.DefaultSerializationProxy
to field scala.collection.mutable.Bar.c of type scala.collection.mutable.Iterable
in instance of scala.collection.mutable.Bar
A stand-alone reproducer:
class A(var b: B) extends Serializable {
def writeReplace: AnyRef = new AProxy(this.b)
}
class AProxy(val b: B) extends Serializable {
def readResolve: AnyRef = new A(b)
}
class B(val a: A) extends Serializable
Test code:
@Test def repr(): Unit = {
val a = new A(null)
val b = new B(a)
a.b = b
SD.serializeDeserialize(a)
}
The readResolve method is only invoked once the AProxy instance is fully deserialized. During deserialization, references to this a object resolve to the proxy.
@retronym points out that this is documented, last paragraph in https://docs.oracle.com/javase/8/docs/platform/serialization/spec/input.html#a5903
The
readResolvemethod is not invoked on the object until the object is fully constructed, so any references to this object in its object graph will not be updated to the new object nominated byreadResolve. [...] if the reference types [...] are not compatible, the construction of the object graph will raise aClassCastException.
Links
- https://bugs.openjdk.org/browse/JDK-8024931
- Apache Fury: https://github.com/apache/fury/pull/1161#issuecomment-2702526803
The same behavior can be triggered with Java collections (agian @retronym's example), just that the use of a serialization proxy is less widespread in Java collections.
@Test def jcoll(): Unit = {
import java.util.{ArrayList => JAL}
import java.util.{List => JL}
val c1 = new JAL[JL[_]]()
val c2 = JL.of(c1)
c1.add(c2)
val c2c = SD.serializeDeserialize(c2)
c2c.get(0).get(0).size() // ClassCastException: class java.util.CollSer cannot be cast to class java.util.List
}
Noting that this was present in 2.12 the cicularly-referred to collection was one of the ones that used serialization proxies (immutable.List notably). Scala 2.13 uses them pervasively so is more exposed to the issue.
A workaround may be indirect the reference through a wrapper that does not use the serialization proxy pattern.
The JDK could probably make it work for classes that use the default serialization. JDK deserialization uses Unsafe.putObject(obj, fieldOffset, value). That call can be delayed if value has a readResolve method. Once the readResolve is actually called, the resulting object can be stored in the field. I did an experiment with ByteBuddy: https://github.com/lrytz/scala/tree/t13092.
But the issue is with classes that implement their own writeObject / readObject, like our DefaultSerializationProxy. The readObject method does
while(count < k) {
builder += in.readObject().asInstanceOf[A]
count += 1
}
where ObjectInputStream.readObject can return a proxy in case of cycles. Example with only collections:
val b1 = ListBuffer[ListBuffer[AnyRef]]()
val b2 = ListBuffer[AnyRef](b1)
b1 += b2
val b1c = SD.serializeDeserialize(b1)
println(b1c.head.head.getClass) // DefaultSerializationProxy
I also saw that there are writeUnshared / readUnshared methods in ObjectOutputStream / ObjectInputStream. But I don't see how that would help, duplicating the proxies would lead to separate collection instances on deserialization.