spring-batch
spring-batch copied to clipboard
Handling Graceful Shutdown in SpringBatch
Bug description
I have an application which uses remote partitioned batch jobs which are sent to the workers via JMS.
I also have ThreadPoolTaskExecutor
configured on the worker side, so the chunks can be processed in parallel.
I was testing the graceful shutdown behavior on the worker side.
One of the testcase was to test what is happening when the processing time of a step on remote side takes longer than the graceful period.
The expected scenario in this case that after the graceful period expires then the partition step terminates end the step state is going to be STOPPED
in the database.
In my case, the application just starts hanging, Spring is not able to fully close the spring context in this scenario. It's hanging in an endless loop in RepeatTemplate.executeInternal()
. This calls TaskExecutorRepeatTemplate.getNextResult()
there it tries calls runnable.expect()
which calls queue.expect();
. Since spring already tries to Interrupt everything this call will fail with an InterruptedException
which then will be translated to a RepeatException
.
https://github.com/spring-projects/spring-batch/blob/e6c27273fa2b3713c6f2d472bf3de1b18f8e5eba/spring-batch-infrastructure/src/main/java/org/springframework/batch/repeat/support/RepeatTemplate.java#L204-L217
Here couple of things can fail:
-
doHandle calls
DefaultExceptionHandler
https://github.com/spring-projects/spring-batch/blob/e6c27273fa2b3713c6f2d472bf3de1b18f8e5eba/spring-batch-infrastructure/src/main/java/org/springframework/batch/repeat/exception/DefaultExceptionHandler.java#L37-L39 This can be overridden by a custom ExceptionHandler so no NPE will be thrown. -
in case DEBUG is enabled then NPE can also be thrown here, since the unwrapped throwable is
null
https://github.com/spring-projects/spring-batch/blob/e6c27273fa2b3713c6f2d472bf3de1b18f8e5eba/spring-batch-infrastructure/src/main/java/org/springframework/batch/repeat/support/RepeatTemplate.java#L288-L290 This can also be fixed by turning of DEBUG. -
and finally here:
https://github.com/spring-projects/spring-batch/blob/e6c27273fa2b3713c6f2d472bf3de1b18f8e5eba/spring-batch-infrastructure/src/main/java/org/springframework/batch/repeat/support/RepeatTemplate.java#L215-L217
I would expect running
to be set to false, however it won't happen the RepeatContext is still not complete.
- Using reflect I was able to add an
RepeatListener
toRepeatTemplate
which calls thecontext.setTerminateOnly()
when the application is shutting down. This allows to break the endless loop here, but after that inAbstarctStep
, it again tries to rethrow null after it extracted out the cause from thisRepeateException
https://github.com/spring-projects/spring-batch/blob/e6c27273fa2b3713c6f2d472bf3de1b18f8e5eba/spring-batch-core/src/main/java/org/springframework/batch/core/step/AbstractStep.java#L232
Environment Please provide as many details as possible: Spring Batch version, Java version, which database you use if any, etc
- openjdk version "17.0.7" 2023-04-18 LTS
- Spring Batch 5.0.2
- Spring Boot 3.1.1
- PostrgeSQL 15.3
Steps to reproduce See above
Expected behavior
- after the graceful period Spring shall be able to forcefully close the ApplicationContext
- no NPE or other exception is expected to be thrown.
- the related step state shall be saved using
STOPPED
state in the datatabase.
Minimal Complete Reproducible example TBD, I will try to create a minimalistic example for this. springbatchissue.zip Steps to reproduce:
- unzip
- execute
./gradlew jibDockerBuild
to create a docker image - start the stack using
docker-compose up
- check the logs for the worker, immediately after the first message is received by the worker execute
kill -15 1
to kill it
You will see the app won't terminate after the graceful period ends. execute kill -3 1
and you will see that it's hanging in an endless loop