Scala 3 lazy vals are not serialization-safe
Description
I'm reporting a vulnerability in Scala 3's lazy val implementation concerning Java serialization. JVM can serialize a lazy val field in the "Waiting" state should rhs evaluation take too long. However, because the thread that calls .countDown() isn't present on the recipient machine (the one deserializing the value), the lazy val remains set to Waiting forever, blocking all threads accessing the lazy val field on CountDownLatch#await() call:
"ClusterSystem-akka.actor.default-dispatcher-4" #31 [41987] prio=5 os_prio=31 cpu=52.91ms elapsed=15.05s tid=0x0000000120046a00 nid=41987 waiting on condition [0x000000017399d000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x000000061ff1fcc0> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:221)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:754)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly([email protected]/AbstractQueuedSynchronizer.java:1099)
at java.util.concurrent.CountDownLatch.await([email protected]/CountDownLatch.java:230)
at ClusterApp$Message.bomb$lzyINIT1(ClusterApp.scala:25)
at ClusterApp$Message.bomb(ClusterApp.scala:24)
at ClusterApp$WorkerNode$.apply$$anonfun$2$$anonfun$1(ClusterApp.scala:52)
at ClusterApp$WorkerNode$$$Lambda/0x00000088014197d8.apply(Unknown Source)
at akka.actor.typed.internal.BehaviorImpl$ReceiveMessageBehavior.receive(BehaviorImpl.scala:152)
at akka.actor.typed.Behavior$.interpret(Behavior.scala:282)
at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:238)
at akka.actor.typed.internal.InterceptorImpl$$anon$2.apply(InterceptorImpl.scala:57)
at akka.actor.typed.internal.adapter.GuardianStopInterceptor.aroundReceive(GuardianStartupBehavior.scala:59)
at akka.actor.typed.internal.InterceptorImpl.receive(InterceptorImpl.scala:85)
at akka.actor.typed.Behavior$.interpret(Behavior.scala:282)
at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:238)
at akka.actor.typed.internal.adapter.ActorAdapter.handleMessage(ActorAdapter.scala:133)
at akka.actor.typed.internal.adapter.ActorAdapter.aroundReceive(ActorAdapter.scala:109)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:577)
at akka.actor.ActorCell.invoke(ActorCell.scala:545)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.util.concurrent.ForkJoinTask.doExec([email protected]/ForkJoinTask.java:387)
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec([email protected]/ForkJoinPool.java:1312)
at java.util.concurrent.ForkJoinPool.scan([email protected]/ForkJoinPool.java:1843)
at java.util.concurrent.ForkJoinPool.runWorker([email protected]/ForkJoinPool.java:1808)
at java.util.concurrent.ForkJoinWorkerThread.run([email protected]/ForkJoinWorkerThread.java:188)
Compiler version
Scala 3.3.1+, probably Scala 3.3.0 too
Minimized code
https://github.com/VirtusLab/scala-3-lazy-val-vs-java-serialization
Workaround
It is possible to resolve this by annotating the lazy val field with @transient.
Expected outcome
We think that it would be a good idea for the compiler to generate all lazy val fields with transient modifier as this implementation will always be prone to this problem, especially given that LazyValControlState extends Serializable since #16806.
This is probably impossible to test correctly on our CI.
Once it is fixed, I suggest adding the reproducer repo to the community build. We can run two first steps from the readme with some predefined timeout, to test if the problem is fixed.
@WojciechMazur will this require significant changes to the OCB?
This could be simplified further to serialize to binary file but it would still require two jvms. Would that fit to CI?
will this require significant changes to the OCB?
No, it should be really straightforward. All we need is to include the reproducer repo in the OpenCB config https://github.com/VirtusLab/community-build3/blob/master/coordinator/configs/custom-projects.txt
Having it just compile will test nothing and with the current shape of the code it just hangs on one thread, there are no assertions as it's just demonstrating the problem. I think it can be greatly simplified (no akka for starters) and incorporated into the test suite of the language instead of OCB.
//> using scala 3.3.3
//> using jvm 21
//> using dep com.softwaremill.ox::core:0.2.2
import java.io.*
import ox.*
import scala.concurrent.duration.*
case class Message(content: String):
lazy val bomb: String =
sleep(200.millis)
"BOMB: " + content
def serialize(obj: Message): Array[Byte] =
val byteStream = ByteArrayOutputStream()
val objectStream = ObjectOutputStream(byteStream)
try
objectStream.writeObject(obj)
byteStream.toByteArray
finally
objectStream.close()
byteStream.close()
def deserialize(bytes: Array[Byte]): Message =
val byteStream = ByteArrayInputStream(bytes)
val objectStream = ObjectInputStream(byteStream)
try
objectStream.readObject().asInstanceOf[Message]
finally
objectStream.close()
byteStream.close()
@main def main =
val bytes = supervised:
val msg = Message("test")
fork:
msg.bomb // start evaluation before serialization
sleep(50.millis) // give some time for the fork to start lazy val rhs eval
serialize(msg) // serialize in the meantime so that we capture Waiting state
val deserializedMsg = deserialize(bytes)
unsupervised:
@volatile var msg = ""
val f = forkCancellable:
msg = deserializedMsg.bomb
sleep(1000.millis)
if !msg.isBlank then println(s"succeeded: $msg")
else
f.cancel()
println("failed to read bomb in 1s!")
arguably can be done without ox and jdk 21 :D one second please
//> using scala 3.3.3
import java.io.*
case class Message(content: String):
lazy val bomb: String =
Thread.sleep(200)
"BOMB: " + content
def serialize(obj: Message): Array[Byte] =
val byteStream = ByteArrayOutputStream()
val objectStream = ObjectOutputStream(byteStream)
try
objectStream.writeObject(obj)
byteStream.toByteArray
finally
objectStream.close()
byteStream.close()
def deserialize(bytes: Array[Byte]): Message =
val byteStream = ByteArrayInputStream(bytes)
val objectStream = ObjectInputStream(byteStream)
try
objectStream.readObject().asInstanceOf[Message]
finally
objectStream.close()
byteStream.close()
@main def main =
val bytes = locally:
val msg = Message("test")
val touch = Thread(() => {
msg.bomb // start evaluation before serialization
()
})
touch.start()
Thread.sleep(50) // give some time for the fork to start lazy val rhs eval
serialize(msg) // serialize in the meantime so that we capture Waiting state
val deserializedMsg = deserialize(bytes)
@volatile var msg = ""
@volatile var started = false
val read = Thread(() => {
started = true
msg = deserializedMsg.bomb
()
})
read.start()
Thread.sleep(1000)
if !started then throw Exception("wtf")
if !msg.isBlank then println(s"succeeded: $msg")
else
read.interrupt()
println("failed to read bomb in 1s!")
This even shows the nice stack trace from CountDownLatch#await():
Exception in thread "Thread-1" java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1100)
at java.base/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
at Message.bomb$lzyINIT1(main.scala:8)
at Message.bomb(main.scala:7)
at main$package$.$anonfun$2(main.scala:49)
at java.base/java.lang.Thread.run(Thread.java:1583)
failed to read bomb in 1s!
We just got that issue in production, not with Java serialization but with Kryo serialization instead. Trying to call a lazy val on a case class that was deserialized by Kryo led to the thread being stuck on that stack:
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x0000040c85642870> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:221)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:754)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly([email protected]/AbstractQueuedSynchronizer.java:1099)
at java.util.concurrent.CountDownLatch.await([email protected]/CountDownLatch.java:230)
at com.devsisters.ck.entities.AchievementCounter2.values$lzyINIT2(AchievementCounter.scala:65)
We use Scala version 3.6.3 so it should include the fix. Was the fix specific to Java serialization? Is there a way to fix it more generally?
@ghostdogpr I recall we had a conversation about this via DMs, right? Were you able to find a more general solution for Kryo for this in the end? I think something like additional entries for Kryo's registry for these state-encoding types from LazyVals were required but I'm not sure and it would be good to leave a trace and maybe update docs. Since lazy vals are getting another makeover for 3.8.0 (due to sun.misc.Unsafe sunsetting), it sounds like a good moment to check up on this.
@lbialy our fix was to add a custom serializer for scala.runtime.LazyVals.Waiting and scala.runtime.LazyVals.Evaluating.type. We contributed it to scala-kryo-serialization in https://github.com/altoo-ag/scala-kryo-serialization/pull/25. We also have a fix for Chill but the library is unmaintained (solution mentioned in this blog post, it's pretty similar). If the new solution in 3.8.0 requires the same kind of fix but with different classes, it will make it harder to add to libraries since those stay on the LTS and won't see these classes.