spring-batch icon indicating copy to clipboard operation
spring-batch copied to clipboard

Spring batch terminate in started status after sigterm

Open esoni opened this issue 4 years ago • 1 comments
trafficstars

Hi All, i have a job that run on k8s. During a shutdown for a cluster update k8s send SIGTERM to my container. My goal is to

  • handle sigterm in a function
  • in this function i want to call stop job and wait the termination I m able to add a shutdown hook using Runtime.getRuntime().addShutdownHook(new Thread() { but in the meanwhile spring batch automatically terminate when i receive SIGTERM and the job remains in STARTED status on the db. How can customize this behavior ? i want to stop correctly the job when i receive sigterm

esoni avatar Oct 31 '21 09:10 esoni

Hi everyone, any update on this?

We're experiencing the same situation and we're evaluating how to tackle this. But we're wondering if this functionality should be handled by the third part or if should be covered by the batch framework. Because, as shared by @esoni, this bug leads to inconsistencies since job status on DB is STARTED even if no job's execution is running on services

Jacopo47 avatar Aug 29 '22 08:08 Jacopo47

During a shutdown for a cluster update k8s send SIGTERM to my container.

Are you using Docker or another container runtime? If using Docker, are you using the shell entry point form? I am asking because the shell form of docker's ENTRYPOINT does not send Unix signals to the sub-process running in the container:

The shell form prevents any CMD or run command line arguments from being used, but has the disadvantage
that your ENTRYPOINT will be started as a subcommand of /bin/sh -c, which does not pass signals. 

This means that the executable will not be the container’s PID 1 - and will not receive Unix signals
- so your executable will not receive a SIGTERM from docker stop <container>.

So in order to correctly intercept Unix signals by the Spring Batch job running in a container, the ENTRYPOINT form should be exec. To my knowledge, the only way offered by the JVM to intercept external signals is shutdown hooks. So if we really want to to stop a job when receiving a SIGTERM, we should add a shutdown hook that calls JobOperator.stop. However, this approach is not guaranteed to work because shutdown hooks are not guaranteed to be called by the JVM. Here is an excerpt from the Javadoc of Runtime.addShutdownHook method:

In rare circumstances the virtual machine may abort, that is, stop running without shutting down cleanly.
This occurs when the virtual machine is terminated externally, for example with the SIGKILL signal on
Unix or the TerminateProcess call on Microsoft Windows.

Moreover, shutdown hooks are expected to run "quickly":

Shutdown hooks should also finish their work quickly. When a program invokes exit the expectation is that
the virtual machine will promptly shut down and exit.

and JobOperator.stop, which involves a database transaction that might cross a network to update the job's status, is not likely to be a "quick" enough operation.

I was planning to write a detailed blog post on this matter and explain what Spring Batch can and cannot do in such circumstances. The best way to start that experiment is to validate the behaviour of the framework to a SIGTERM signal with and without a shutdown hook.


Some related references:

fmbenhassine avatar Apr 27 '23 13:04 fmbenhassine