spring-framework
spring-framework copied to clipboard
[CRaC] Fix hangup after restoring
I run following ApplicationRunner Spring Boot app and I obtained checkpoint by CRIU. The app did not finish after restoring.
@Override
public void run(ApplicationArguments args) throws Exception {
if(args.containsOption("checkpoint")){
System.out.println("Ready to obtain checkpoint...");
// Wait restoring...
cpCoordinator.await();
}
System.out.println("from Spring Boot App");
}
I obtained thread dump, then I got following stack trace. It shows beforeCheckpoint CRaC handler waits signal in CyclicBarrier.
"prevent-shutdown" #29 [1504] prio=5 os_prio=0 cpu=0.17ms elapsed=25.76s tid=0x00007feb1017db00 nid=1504 waiting on condition [0x00007feb4e22b000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x000000008a9279b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:371)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block([email protected]/AbstractQueuedSynchronizer.java:519)
at java.util.concurrent.ForkJoinPool.unmanagedBlock([email protected]/ForkJoinPool.java:3780)
at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3725)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await([email protected]/AbstractQueuedSynchronizer.java:1707)
at java.util.concurrent.CyclicBarrier.dowait([email protected]/CyclicBarrier.java:236)
at java.util.concurrent.CyclicBarrier.await([email protected]/CyclicBarrier.java:364)
at org.springframework.context.support.DefaultLifecycleProcessor$CracResourceAdapter.awaitPreventShutdownBarrier(DefaultLifecycleProcessor.java:634)
at org.springframework.context.support.DefaultLifecycleProcessor$CracResourceAdapter.lambda$beforeCheckpoint$0(DefaultLifecycleProcessor.java:606)
at org.springframework.context.support.DefaultLifecycleProcessor$CracResourceAdapter$$Lambda/0x00007feb501c37c0.run(Unknown Source)
at java.lang.Thread.runWith([email protected]/Thread.java:1596)
at java.lang.Thread.run([email protected]/Thread.java:1583)
I investigated CracResourceAdapter, prevent-shutdown thread might through the second awaitPreventShutdownBarrier() call if that thread runs before awaitPreventShutdownBarrier() at beforeCheckpoint().
We need to separate barriers for beforeCheckpoint / afterRestore to work as expected.