mojo
mojo copied to clipboard
Subprocess: exit_code randomly garbled
- Mojolicious version: 9.21
- Perl version: 5.16.3
- Operating system: CentOS 7
- Architecture: x86_64
Steps to reproduce the behavior
Mojo::IOLoop::Subprocess randomly returns garbage in exit_code. I strongly suspect something else is reaping the children, but I can't figure out what. I can confirm that waitpid() in Subprocess.pm sometimes returns -1, and sets $? to -1 as well.
It seems to happen only when there's more than one Subprocess running at the same time.
Here's a minimal example to reproduce:
#!/usr/bin/perl
use warnings;
use strict;
use Mojolicious::Lite;
use Mojo::IOLoop::Subprocess;
use Time::HiRes qw(usleep);
sub start_child() {
my $p = Mojo::IOLoop::Subprocess->new;
$p->run_p (sub {
usleep (rand (100) * 1000);
})->then (sub {
warn "pid=" . $p->pid . " exited: exit_code=" . $p->exit_code;
$p->exit_code == 0 or do { warn "this should never happen"; exit (1); };
})->finally ( sub {
$p->ioloop->next_tick (\&start_child);
});
$p->on (spawn => sub {
warn "started pid=" . $p->pid;
});
};
Mojo::IOLoop->singleton->next_tick (\&start_child);
Mojo::IOLoop->singleton->next_tick (\&start_child);
app->start;
Sample output:
$ ./test1.pl daemon
[2022-01-08 19:24:14.29748] [19819] [info] Listening at "http://*:3000"
Web application available at http://127.0.0.1:3000
started pid=19820 at ./test1.pl line 20.
started pid=19821 at ./test1.pl line 20.
pid=19821 exited: exit_code=0 at ./test1.pl line 14.
started pid=19822 at ./test1.pl line 20.
pid=19820 exited: exit_code=0 at ./test1.pl line 14.
started pid=19823 at ./test1.pl line 20.
pid=19822 exited: exit_code=0 at ./test1.pl line 14.
started pid=19824 at ./test1.pl line 20.
pid=19823 exited: exit_code=0 at ./test1.pl line 14.
started pid=19825 at ./test1.pl line 20.
pid=19824 exited: exit_code=0 at ./test1.pl line 14.
started pid=19826 at ./test1.pl line 20.
pid=19826 exited: exit_code=0 at ./test1.pl line 14.
started pid=19827 at ./test1.pl line 20.
pid=19825 exited: exit_code=0 at ./test1.pl line 14.
started pid=19828 at ./test1.pl line 20.
pid=19827 exited: exit_code=72057594037927935 at ./test1.pl line 14.
this should never happen at ./test1.pl line 15.
Expected behavior
exit_code should always be 0 in the above example.
Actual behavior
exit_code sometimes contains 72057594037927935, which is really -1 (all bits set to 1).
exit_code sometimes contains 72057594037927935, which is really -1 (all bits set to 1).
this is actually -1 >> 8, which is how exit_code is set from $? after waitpid.
Perhaps $? should be checked for -1 to throw a more useful error (I believe $! would be set if this happens). But it sounds like you're talking about the bug that it returns -1 at all.
Or perhaps it should be ignored if it's -1; the documentation just says this indicates the process has already been reaped. But that means we don't know the exit code which is unfortunate.
But what's reaping them? Something inside Mojolicious?
I can't think of anything that would be doing so, and haven't run into this problem
Does my example fail for you?
I can't find anything either, but there are lines similar to: waitpid (-1, WNOHANG) in Mojo/Server/Prefork.pm and AnyEvent.pm . I don't think these are used by default so it wouldn't cause problems with my example above, but still...
If it's not fixable, I would suggest to at least document it.
And maybe also warn people about calling exec in the child, as it causes IOLoop to lock up.