ant icon indicating copy to clipboard operation
ant copied to clipboard

Destroy forcibly forked test using SIGKILL

Open tkowalcz opened this issue 1 year ago • 3 comments

If the JVM of the monitored forked test hangs then Process:destroy might not be effective. On unix systems it will send SIGINT. An alternative is to use Process: destroyForcibly method to use SIGKILL.

Using SIGKILL will not allow the monitored process to cleanly shutdown, but the question is if it already timed out then it will probably never do any cleanup.

Alternatively watchdog could try destroy first and then destroyForcibly.

Comments? Thanks!

tkowalcz avatar Aug 02 '24 13:08 tkowalcz

Hello Tomasz, can you tell us the details of the issue you are running into? That might help understand what change needs to be done.

jaikiran avatar Aug 03 '24 13:08 jaikiran

Hello Tomasz, can you tell us the details of the issue you are running into? That might help understand what change needs to be done.

Absolutely. I just wanted to get the conversation started. Thanks for taking time to reply.

When using junitlauncher with timeout:

 <junitlauncher
    taskname="JUnit5"
    haltonfailure="${junit.haltonfailure}"
    failureproperty="junit.failures"
    printsummary="false">
    ...
    <fork timeout="${junit.timeout}">
        ...
    </fork>
</junitlauncher>

it setups ExecuteWatchdog that will terminate the forked process after timeout passes. If the forked JVM is very busy (e.g. doing GC back to back) it will not terminate (it has the signal handler installed but fails to act upon receiving the signal). The only option is to issue a SIGKILL.

In our case we had the JVM stuck for yet to be discovered reason. Tools like jstack were unable to attach to it unless -Force option was used. The CI job that was running test suite got stuck waiting for ant task to time out but it never did. Eventually job level timeout of Jenkins kicked in and terminated the parent process.

I was able to verify following - sending SIGINT to the forked JVM did not shut it down. Sending SIGKILL did and the junitlauncher properly continued - set failureproperty and continued if haltonfailure was set to false.

Since the test in question got stuck consistently I verified that after this change ExecuteWatchdog correctly terminated forked process.

tkowalcz avatar Aug 04 '24 10:08 tkowalcz

Hello Tomasz,

Tools like jstack were unable to attach to it unless -Force option was used.

Were you able to get hold of a thread dump with -F?

This code dealing with process termination resides in a core layer of Ant and has been around for a long time. So having as much details as possible to see what's causing this issue will help understand if this code deserves a change or if we should address this in a different manner (perhaps in the junitlauncher task specific code).

jaikiran avatar Aug 10 '24 13:08 jaikiran