mojo icon indicating copy to clipboard operation
mojo copied to clipboard

Subprocess: exit_code randomly garbled

Open dpanech opened this issue 3 years ago • 6 comments

  • Mojolicious version: 9.21
  • Perl version: 5.16.3
  • Operating system: CentOS 7
  • Architecture: x86_64

Steps to reproduce the behavior

Mojo::IOLoop::Subprocess randomly returns garbage in exit_code. I strongly suspect something else is reaping the children, but I can't figure out what. I can confirm that waitpid() in Subprocess.pm sometimes returns -1, and sets $? to -1 as well.

It seems to happen only when there's more than one Subprocess running at the same time.

Here's a minimal example to reproduce:

#!/usr/bin/perl
use warnings;
use strict;
use Mojolicious::Lite;
use Mojo::IOLoop::Subprocess;
use Time::HiRes qw(usleep);

sub start_child() {
    my $p = Mojo::IOLoop::Subprocess->new;
    $p->run_p (sub {
        usleep (rand (100) * 1000);
    })->then (sub {
        warn "pid=" . $p->pid . " exited: exit_code=" . $p->exit_code;
        $p->exit_code == 0 or do { warn "this should never happen"; exit (1); };
    })->finally ( sub {
        $p->ioloop->next_tick (\&start_child);
    });
    $p->on (spawn => sub {
        warn "started pid=" . $p->pid;
    });
};

Mojo::IOLoop->singleton->next_tick (\&start_child);
Mojo::IOLoop->singleton->next_tick (\&start_child);

app->start;

Sample output:

$ ./test1.pl daemon
[2022-01-08 19:24:14.29748] [19819] [info] Listening at "http://*:3000"
Web application available at http://127.0.0.1:3000
started pid=19820 at ./test1.pl line 20.
started pid=19821 at ./test1.pl line 20.
pid=19821 exited: exit_code=0 at ./test1.pl line 14.
started pid=19822 at ./test1.pl line 20.
pid=19820 exited: exit_code=0 at ./test1.pl line 14.
started pid=19823 at ./test1.pl line 20.
pid=19822 exited: exit_code=0 at ./test1.pl line 14.
started pid=19824 at ./test1.pl line 20.
pid=19823 exited: exit_code=0 at ./test1.pl line 14.
started pid=19825 at ./test1.pl line 20.
pid=19824 exited: exit_code=0 at ./test1.pl line 14.
started pid=19826 at ./test1.pl line 20.
pid=19826 exited: exit_code=0 at ./test1.pl line 14.
started pid=19827 at ./test1.pl line 20.
pid=19825 exited: exit_code=0 at ./test1.pl line 14.
started pid=19828 at ./test1.pl line 20.
pid=19827 exited: exit_code=72057594037927935 at ./test1.pl line 14.
this should never happen at ./test1.pl line 15.

Expected behavior

exit_code should always be 0 in the above example.

Actual behavior

exit_code sometimes contains 72057594037927935, which is really -1 (all bits set to 1).

dpanech avatar Jan 09 '22 00:01 dpanech

exit_code sometimes contains 72057594037927935, which is really -1 (all bits set to 1).

this is actually -1 >> 8, which is how exit_code is set from $? after waitpid.

Grinnz avatar Jan 09 '22 01:01 Grinnz

Perhaps $? should be checked for -1 to throw a more useful error (I believe $! would be set if this happens). But it sounds like you're talking about the bug that it returns -1 at all.

Grinnz avatar Jan 09 '22 01:01 Grinnz

Or perhaps it should be ignored if it's -1; the documentation just says this indicates the process has already been reaped. But that means we don't know the exit code which is unfortunate.

Grinnz avatar Jan 09 '22 01:01 Grinnz

But what's reaping them? Something inside Mojolicious?

dpanech avatar Jan 09 '22 01:01 dpanech

I can't think of anything that would be doing so, and haven't run into this problem

Grinnz avatar Jan 09 '22 01:01 Grinnz

Does my example fail for you?

I can't find anything either, but there are lines similar to: waitpid (-1, WNOHANG) in Mojo/Server/Prefork.pm and AnyEvent.pm . I don't think these are used by default so it wouldn't cause problems with my example above, but still...

If it's not fixable, I would suggest to at least document it.

And maybe also warn people about calling exec in the child, as it causes IOLoop to lock up.

dpanech avatar Jan 09 '22 02:01 dpanech