Cavalcade-Runner icon indicating copy to clipboard operation
Cavalcade-Runner copied to clipboard

Jobs show "failed" when service is restarted.

Open joehoyle opened this issue 6 years ago • 4 comments

When the cavalcade service is restarted, we do two things:

  • Ignore the signal in any wp cavalcade run processes
  • Wait for all running workers to complete in the cavalcade-runner

This works well to let the jobs complete, but the status of the process is changed, and cavalcade-runner interprets it as a fail.

is_done will return true here, but shutdown() will return -1. This is because (it seems) once a process has been sent SIGTERM, proc_get_status will return:

(
    [command] => wp cavalcade run 440 --url='example.com/'
    [pid] => 13589
    [running] =>
    [signaled] => 1
    [stopped] =>
    [exitcode] => -1
    [termsig] => 15
    [stopsig] => 0
)

(see exitcode)

According to the PHP docs: "The exit code returned by the process (which is only meaningful if running is FALSE). Only first call of this function return real value, next calls return -1." I think this might be an undocumented side-effect of a process ending with SIGTERM.

I think we need to have some logic to handle the case when signaled => 1 or stopsig => 15, and maybe return 0 instead of -1 in those cases?

joehoyle avatar Aug 30 '18 16:08 joehoyle

Hmm, I'm unable to replicate this locally sending a SIGTERM to a child process. Code I'm using:

<?php
// test-run.php

$command = 'php test-kill.php';
$spec = [
	1 => STDOUT,
	2 => STDERR,
];
$pipes = [];

$process = proc_open( $command, $spec, $pipes );
if ( ! is_resource( $process ) ) {
	throw new Exception( 'Unable to proc_open.' );
}

while ( true ) {
	$status = proc_get_status( $process );
	if ( ! $status['running'] ) {
		echo "Finished\n";
		break;
	}
}

var_dump( $status );
<?php
// test-kill.php

// Ignore signals.
pcntl_signal( SIGTERM, SIG_IGN );

echo getmypid() . "\n";

sleep( 10 );

echo "Done\n";

I'm sending a SIGTERM with kill {pid}.

This gives me:

array(8) {
  ["command"]=>
  string(17) "php test-kill.php"
  ["pid"]=>
  int(19901)
  ["running"]=>
  bool(false)
  ["signaled"]=>
  bool(false)
  ["stopped"]=>
  bool(false)
  ["exitcode"]=>
  int(0)
  ["termsig"]=>
  int(0)
  ["stopsig"]=>
  int(0)
}

With the pcntl_signal disabled:

array(8) {
  ["command"]=>
  string(17) "php test-kill.php"
  ["pid"]=>
  int(19904)
  ["running"]=>
  bool(false)
  ["signaled"]=>
  bool(false)
  ["stopped"]=>
  bool(false)
  ["exitcode"]=>
  int(143)
  ["termsig"]=>
  int(0)
  ["stopsig"]=>
  int(0)
}

Interestingly, signaled is false here too?

rmccue avatar Aug 31 '18 01:08 rmccue

Confirmed the same behaviour when test-kill.php is inside a wp-cli command instead, so it's not wp-cli causing this I guess.

rmccue avatar Aug 31 '18 01:08 rmccue

Hmm yeah I'm seeing the same with your script. There seems be be something wrong with the results though. I can't get signaled => true to happen with this script, where I am seeing that happen with the service restart. According to the docs: "TRUE if the child process has been terminated by an uncaught signal. Always set to FALSE on Windows.", but if I remove the pcntl_signal from test-kill.php and kill it, I don't get signaled => true

joehoyle avatar Aug 31 '18 20:08 joehoyle

The exit code returned by the process (which is only meaningful if running is FALSE). Only first call of this function return real value, next calls return -1.

I don't quite get how this works, but is it possible systemd is doing a syscall for the exit code, so we are not able to get it by the time PHP looks for it?

joehoyle avatar Aug 31 '18 21:08 joehoyle