parallel icon indicating copy to clipboard operation
parallel copied to clipboard

Dealing with Zombies processes

Open mlasri-web2 opened this issue 2 years ago • 1 comments

Hi, I have an API that runs multiple Parallel tasks. In the production/preprod after upgrading to the latest version, we noticed that we are left with a huge amount of zombies processes (each request leaves 8 to 10 zombies, the equivalent number of workers) and we got 5k zombies in less then 1 hour.

foreach($tasks as $key => $task)
{
$executions[$taskKey] = submit(
                    $evaluation,
                    new TimeoutCancellation( $timeout, sprintf("Engine timeout reached! %s",$timeout ))
                );
}

[$errors, $result] = awaitAll(array_map(
                fn (Execution $exe) => $exe->getFuture(),
                $executions
            ));

            foreach ($errors as $key => $response) {
                $this->log('warning',printf(
                    "First pool error with key: %s and message %s \n",
                    $key,
                    $response
                ));
            }

note: the ext pcntl is enabled.

Update 1:

  • i have tried to reduce the max number of workers but no magic result, still got zombies $this->worker = new ContextWorkerPool(5);

@kelunik can you please check with me this problem?

thanks in advanced

mlasri-web2 avatar Oct 19 '23 09:10 mlasri-web2

Hi @mlasri-web2!

I thought the zombie process was fixed in amphp/process, see PosixHandle::waitPid(), which should be invoked by the event-loop callback which is awaiting the exit code pipe for the process. Would you be able to have a look to see if there's any reason the waitPid function is not being invoked?

trowski avatar Nov 26 '23 15:11 trowski

Hi @trowski, am reopening the subject again as we kept deleting the .sock files left in /tmp/ using a job. temporary solution for these zombies.

The actual situation is we use the Process in two ways, CLI and HTTP. the CLI does close the .sock files properly, but the same service in Symfony controller (POST http endpoint) does leave One .sock in each request: srwxrwxrwx 1 www-data www-data 0 Apr 5 00:53 amp-parallel-ipc-87cff776753e279b1e56.sock.

Note: PosixHandle::asyncWaitPid is invoked

Am still trying to figure out the reason and a solution.

medy36 avatar Apr 05 '24 01:04 medy36

The temporary .sock file should be removed by LocalIpcHub::unlink(), which should be invoked in the destructor. Can you have a look if this function is being invoked, and if not, why that might be happening? We are suppressing errors there, you could try removing that suppression to see if there is an unexpected error (though if the file was created, I'd expect removing it to succeed as well).

trowski avatar Apr 05 '24 02:04 trowski

@trowski thank you for pointing the LocalIpcHub::unlink() i found out that amp/parallel was in version V2.2.2 which does not include the destructor().

The upgrade works perfectly! Thank you very much.

medy36 avatar Apr 08 '24 08:04 medy36

@medy36 Thanks for confirming my recent changes fixed the issue!

trowski avatar Apr 13 '24 03:04 trowski

Hi @trowski the issue of .sock left in /tmp/ is still persisting when executing in CLI. same code.

medy36 avatar May 23 '24 13:05 medy36

Hey @medy36! So the fix worked, then stopped working? Nothing has changed that I'm aware of. Would you be able to give me a bit more context and some code to reproduce the issue?

trowski avatar May 23 '24 14:05 trowski

The fix does work! when executing the code in a http request the fix is working.

But when executing the same code in CLI (i am running a job ) the .sock are left.

medy36 avatar May 23 '24 14:05 medy36

@trowski should I open a new issue regarding the execution in CLI mode?

medy36 avatar May 24 '24 08:05 medy36

@trowski in CLI mode, the _desctruct() is not even reached!

So no unlick is made.

i am struggling at this point, if you have any guidance to propose that would be very helful

medy36 avatar Jul 11 '24 09:07 medy36