bug icon indicating copy to clipboard operation
bug copied to clipboard

Collection deserialization fails with cyclic object graph

Open lrytz opened this issue 9 months ago • 2 comments

Scala collections use a serialization proxy, which can leak during deserialization of a cyclic object graph.

Utility:

object SD {
  import java.io._, scala.util.chaining._
  def serialize(obj: AnyRef) = new ByteArrayOutputStream().tap(b => new ObjectOutputStream(b).writeObject(obj)).toByteArray
  def deserialize(a: Array[Byte]) = new ObjectInputStream(new ByteArrayInputStream(a)).readObject()
  def serializeDeserialize[T <: AnyRef](obj: T) = deserialize(serialize(obj)).asInstanceOf[T]
}

Test code:

  @Test def coll(): Unit = {
    val b = ListBuffer[AnyRef]()
    val bar = new Bar(b)
    b += bar
    SD.serializeDeserialize(b)
  }

This fails with

java.lang.ClassCastException:
cannot assign instance of scala.collection.generic.DefaultSerializationProxy
to field scala.collection.mutable.Bar.c of type scala.collection.mutable.Iterable
in instance of scala.collection.mutable.Bar

A stand-alone reproducer:

class A(var b: B) extends Serializable {
  def writeReplace: AnyRef = new AProxy(this.b)
}

class AProxy(val b: B) extends Serializable {
  def readResolve: AnyRef = new A(b)
}

class B(val a: A) extends Serializable

Test code:

  @Test def repr(): Unit = {
    val a = new A(null)
    val b = new B(a)
    a.b = b
    SD.serializeDeserialize(a)
  }

The readResolve method is only invoked once the AProxy instance is fully deserialized. During deserialization, references to this a object resolve to the proxy.

@retronym points out that this is documented, last paragraph in https://docs.oracle.com/javase/8/docs/platform/serialization/spec/input.html#a5903

The readResolve method is not invoked on the object until the object is fully constructed, so any references to this object in its object graph will not be updated to the new object nominated by readResolve. [...] if the reference types [...] are not compatible, the construction of the object graph will raise a ClassCastException.

Links

  • https://bugs.openjdk.org/browse/JDK-8024931
  • Apache Fury: https://github.com/apache/fury/pull/1161#issuecomment-2702526803

The same behavior can be triggered with Java collections (agian @retronym's example), just that the use of a serialization proxy is less widespread in Java collections.

  @Test def jcoll(): Unit = {
    import java.util.{ArrayList => JAL}
    import java.util.{List => JL}
    val c1 = new JAL[JL[_]]()
    val c2 = JL.of(c1)
    c1.add(c2)
    val c2c = SD.serializeDeserialize(c2)
    c2c.get(0).get(0).size() // ClassCastException: class java.util.CollSer cannot be cast to class java.util.List
  }

lrytz avatar Mar 06 '25 12:03 lrytz

Noting that this was present in 2.12 the cicularly-referred to collection was one of the ones that used serialization proxies (immutable.List notably). Scala 2.13 uses them pervasively so is more exposed to the issue.

A workaround may be indirect the reference through a wrapper that does not use the serialization proxy pattern.

retronym avatar Mar 06 '25 14:03 retronym

The JDK could probably make it work for classes that use the default serialization. JDK deserialization uses Unsafe.putObject(obj, fieldOffset, value). That call can be delayed if value has a readResolve method. Once the readResolve is actually called, the resulting object can be stored in the field. I did an experiment with ByteBuddy: https://github.com/lrytz/scala/tree/t13092.

But the issue is with classes that implement their own writeObject / readObject, like our DefaultSerializationProxy. The readObject method does

      while(count < k) {
        builder += in.readObject().asInstanceOf[A]
        count += 1
      }

where ObjectInputStream.readObject can return a proxy in case of cycles. Example with only collections:

    val b1 = ListBuffer[ListBuffer[AnyRef]]()
    val b2 = ListBuffer[AnyRef](b1)
    b1 += b2
    val b1c = SD.serializeDeserialize(b1)
    println(b1c.head.head.getClass) // DefaultSerializationProxy

I also saw that there are writeUnshared / readUnshared methods in ObjectOutputStream / ObjectInputStream. But I don't see how that would help, duplicating the proxies would lead to separate collection instances on deserialization.

lrytz avatar Mar 07 '25 12:03 lrytz