Cavalcade-Runner
Cavalcade-Runner copied to clipboard
Jobs show "failed" when service is restarted.
When the cavalcade service is restarted, we do two things:
- Ignore the signal in any
wp cavalcade run
processes - Wait for all running workers to complete in the cavalcade-runner
This works well to let the jobs complete, but the status of the process is changed, and cavalcade-runner interprets it as a fail.
is_done
will return true
here, but shutdown()
will return -1
. This is because (it seems) once a process has been sent SIGTERM
, proc_get_status
will return:
(
[command] => wp cavalcade run 440 --url='example.com/'
[pid] => 13589
[running] =>
[signaled] => 1
[stopped] =>
[exitcode] => -1
[termsig] => 15
[stopsig] => 0
)
(see exitcode
)
According to the PHP docs: "The exit code returned by the process (which is only meaningful if running is FALSE). Only first call of this function return real value, next calls return -1." I think this might be an undocumented side-effect of a process ending with SIGTERM
.
I think we need to have some logic to handle the case when signaled => 1
or stopsig => 15
, and maybe return 0
instead of -1
in those cases?
Hmm, I'm unable to replicate this locally sending a SIGTERM to a child process. Code I'm using:
<?php
// test-run.php
$command = 'php test-kill.php';
$spec = [
1 => STDOUT,
2 => STDERR,
];
$pipes = [];
$process = proc_open( $command, $spec, $pipes );
if ( ! is_resource( $process ) ) {
throw new Exception( 'Unable to proc_open.' );
}
while ( true ) {
$status = proc_get_status( $process );
if ( ! $status['running'] ) {
echo "Finished\n";
break;
}
}
var_dump( $status );
<?php
// test-kill.php
// Ignore signals.
pcntl_signal( SIGTERM, SIG_IGN );
echo getmypid() . "\n";
sleep( 10 );
echo "Done\n";
I'm sending a SIGTERM
with kill {pid}
.
This gives me:
array(8) {
["command"]=>
string(17) "php test-kill.php"
["pid"]=>
int(19901)
["running"]=>
bool(false)
["signaled"]=>
bool(false)
["stopped"]=>
bool(false)
["exitcode"]=>
int(0)
["termsig"]=>
int(0)
["stopsig"]=>
int(0)
}
With the pcntl_signal
disabled:
array(8) {
["command"]=>
string(17) "php test-kill.php"
["pid"]=>
int(19904)
["running"]=>
bool(false)
["signaled"]=>
bool(false)
["stopped"]=>
bool(false)
["exitcode"]=>
int(143)
["termsig"]=>
int(0)
["stopsig"]=>
int(0)
}
Interestingly, signaled
is false here too?
Confirmed the same behaviour when test-kill.php
is inside a wp-cli command instead, so it's not wp-cli causing this I guess.
Hmm yeah I'm seeing the same with your script. There seems be be something wrong with the results though. I can't get signaled => true
to happen with this script, where I am seeing that happen with the service restart
. According to the docs: "TRUE if the child process has been terminated by an uncaught signal. Always set to FALSE on Windows.", but if I remove the pcntl_signal
from test-kill.php
and kill
it, I don't get signaled => true
The exit code returned by the process (which is only meaningful if running is FALSE). Only first call of this function return real value, next calls return -1.
I don't quite get how this works, but is it possible systemd
is doing a syscall for the exit code, so we are not able to get it by the time PHP looks for it?