spring-framework icon indicating copy to clipboard operation
spring-framework copied to clipboard

[CRaC] Fix hangup after restoring

Open YaSuenag opened this issue 10 months ago • 3 comments

I run following ApplicationRunner Spring Boot app and I obtained checkpoint by CRIU. The app did not finish after restoring.

  @Override
  public void run(ApplicationArguments args) throws Exception {
    if(args.containsOption("checkpoint")){
      System.out.println("Ready to obtain checkpoint...");
      // Wait restoring...
      cpCoordinator.await();
    }
    System.out.println("from Spring Boot App");
  }

I obtained thread dump, then I got following stack trace. It shows beforeCheckpoint CRaC handler waits signal in CyclicBarrier.

"prevent-shutdown" #29 [1504] prio=5 os_prio=0 cpu=0.17ms elapsed=25.76s tid=0x00007feb1017db00 nid=1504 waiting on condition  [0x00007feb4e22b000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x000000008a9279b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:371)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block([email protected]/AbstractQueuedSynchronizer.java:519)
        at java.util.concurrent.ForkJoinPool.unmanagedBlock([email protected]/ForkJoinPool.java:3780)
        at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3725)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await([email protected]/AbstractQueuedSynchronizer.java:1707)
        at java.util.concurrent.CyclicBarrier.dowait([email protected]/CyclicBarrier.java:236)
        at java.util.concurrent.CyclicBarrier.await([email protected]/CyclicBarrier.java:364)
        at org.springframework.context.support.DefaultLifecycleProcessor$CracResourceAdapter.awaitPreventShutdownBarrier(DefaultLifecycleProcessor.java:634)
        at org.springframework.context.support.DefaultLifecycleProcessor$CracResourceAdapter.lambda$beforeCheckpoint$0(DefaultLifecycleProcessor.java:606)
        at org.springframework.context.support.DefaultLifecycleProcessor$CracResourceAdapter$$Lambda/0x00007feb501c37c0.run(Unknown Source)
        at java.lang.Thread.runWith([email protected]/Thread.java:1596)
        at java.lang.Thread.run([email protected]/Thread.java:1583)

I investigated CracResourceAdapter, prevent-shutdown thread might through the second awaitPreventShutdownBarrier() call if that thread runs before awaitPreventShutdownBarrier() at beforeCheckpoint().

We need to separate barriers for beforeCheckpoint / afterRestore to work as expected.

YaSuenag avatar Feb 06 '25 03:02 YaSuenag