Parallel processing in loop for more than 200 iteration causing zombie processes
I'm using amphp/parallel to download a number of image files parallel, so that it could take less time as compare to normal processing. It is seen that, it is working fine if there is request <= 200, all files are being downloaded successfully within few seconds. The issue is that, if request > 200, it gets stuck and it is seen that there are 200 zombie processes are created in process table list. It can be seen using Linux top command. However, in this case 200 files are able to download successfully.
/**
* $items An array of files to download
*/
Loop::run(function () use ($items) {
try {
$processArray = array();
for ($counter = 0; $counter <= count($items); $counter++) {
// Create a new child process that does some blocking stuff.
$context = yield Process::run(__DIR__ . "/Blocking.php");
\assert($context instanceof Process);
array_push($processArray, $context);
// Pipe any data written to the STDOUT in the child process to STDOUT of this process.
rethrow(ByteStream\pipe($context->getStdout(), ByteStream\getStdout()));
yield $context->send($items[$counter]);
}
foreach ($processArray as $p) {
yield $p->receive(); // message from child;
yield $p->join(); // Terminate the child process
}
} finally {
}
});
Blocking.php
return function (Channel $channel): \Generator {
$data = yield $channel->receive();
file_put_contents($data['path'], file_get_contents($data['url']));
yield $channel->send("");
return true;
};
Due to zombie processes still in process table, there is no response further for new request. It needs to restart the Apache server and it clears the zombie processes. Proof: https://drive.google.com/file/d/1Z4WMGKA_56QaBMdZ6wXkyx7ilBPYYSkw/view
I do not know why zombie processes are creating. Is there implementation issue of parallel processing in my code? Are processes are not exiting correctly? Is their any Apache limitation of process handling per request?
It appears xdebug was enabled. Please disable xdebug and let me know if the problem persists.
I have disabled the xdebug, but there is no success. Got the same issue again.
If you're just downloading files, I strongly recommend using https://github.com/amphp/http-client instead, which doesn't require multiple processes. It'll be much more efficient and performant.
Zombie processes might be due to open file limits kicking in and might only be a follow-up error.
As per your suggestion, I'm modifying the code be like this
Loop::run(static function () use ($items): \Generator {
// Instantiate the HTTP client
$client = HttpClientBuilder::buildDefault();
$requestHandler = static function (string $uri) use ($client): \Generator {
/** @var Response $response */
$response = yield $client->request(new Request($uri));
return yield $response->getBody()->buffer();
};
try {
$promises = [];
foreach ($items as $item) {
$promises[$item['url']] = Amp\call($requestHandler, $item['url']);
}
$bodies = yield $promises;
foreach ($bodies as $item => $body) {
print $uri . " - " . \strlen($body) . " bytes" . PHP_EOL;
}
} catch (HttpException $error) {
// If something goes wrong Amp will throw the exception where the promise was yielded.
// The HttpClient::request() method itself will never throw directly, but returns a promise.
echo $error;
}
});
But I'm unable to understand that, where to add the code for saving downloaded file to a specific path as I was doing in the blocking.php earlier.
I'll work similar than before. You can also directly stream the response body to disk without buffering it.
Loop::run(static function () use ($items): \Generator {
$client = HttpClientBuilder::buildDefault();
$requestHandler = static function (string $job) use ($client): \Generator {
/** @var Response $response */
$response = yield $client->request(new Request($job['url']));
yield Amp\ByteStream\pipe($response->getBody(), yield Amp\File\open($job['path'], 'w'));
};
try {
$promises = [];
foreach ($items as $item) {
$promises[$item['url']] = Amp\call($requestHandler, $item);
}
yield $promises;
} catch (HttpException $error) {
echo $error;
}
});
Wow, I'm able to download 284 files in a moment. There are some changes in the code as per the issues found.
Loop::run(static function () use ($items): \Generator {
// Instantiate the HTTP client
$client = HttpClientBuilder::buildDefault();
$requestHandler = static function (Array $job) use ($client): \Generator {
/** @var Response $response */
$part_file = $job['part'];
$response = yield $client->request(new Request($job['url']));
yield ByteStream\pipe($response->getBody(), yield File\openFile($job['path'], 'w'));
if (file_exists($part_file)) {
unlink($part_file);
}
};
try {
$promises = [];
foreach ($items as $item) {
$promises[$item['url']] = call($requestHandler, $item);
}
yield $promises;
} catch (HttpException $error) {
echo $error;
}
});
But at the end there is an error thrown which is as follows.
Amp\Http\Client\TimeoutException: Allowed transfer timeout exceeded, took longer than 10000 ms in /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/http-client/src/Connection/Http1Connection.php:443
Stack trace:
#0 [internal function]: Amp\Http\Client\Connection\Http1Connection->Amp\Http\Client\Connection\{closure}()
#1 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Coroutine.php(118): Generator->send(NULL)
#2 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Internal/Placeholder.php(46): Amp\Coroutine->Amp\{closure}(NULL, NULL)
#3 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Internal/PrivatePromise.php(23): class@anonymous->onResolve(Object(Closure))
#4 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Internal/Placeholder.php(143): Amp\Internal\PrivatePromise->onResolve(Object(Closure))
#5 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Deferred.php(52): class@anonymous->resolve(Object(Amp\Internal\PrivatePromise))
#6 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/functions.php(282): Amp\Deferred->resolve(Object(Amp\Internal\PrivatePromise))
#7 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Internal/Placeholder.php(149): Amp\Promise\{closure}(NULL, NULL)
#8 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Deferred.php(52): class@anonymous->resolve(NULL)
#9 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/byte-stream/lib/ResourceInputStream.php(198): Amp\Deferred->resolve()
#10 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/byte-stream/lib/ResourceInputStream.php(182): Amp\ByteStream\ResourceInputStream->free()
#11 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/socket/src/ResourceSocket.php(166): Amp\ByteStream\ResourceInputStream->close()
#12 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/http-client/src/Connection/Http1Connection.php(122): Amp\Socket\ResourceSocket->close()
#13 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/functions.php(90): Amp\Http\Client\Connection\Http1Connection->close(Object(Amp\CancelledException))
#14 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/functions.php(121): Amp\call(Array, Object(Amp\CancelledException))
#15 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/CombinedCancellationToken.php(29): Amp\asyncCall(Array, Object(Amp\CancelledException))
#16 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/functions.php(90): Amp\CombinedCancellationToken->Amp\{closure}(Object(Amp\CancelledException))
#17 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/functions.php(121): Amp\call(Object(Closure), Object(Amp\CancelledException))
#18 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/CombinedCancellationToken.php(29): Amp\asyncCall(Object(Closure), Object(Amp\CancelledException))
#19 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/CancellationTokenSource.php(92): Amp\CombinedCancellationToken->Amp\{closure}(Object(Amp\CancelledException))
#20 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/CancellationTokenSource.php(77): class@anonymous->invokeCallback(Object(Closure))
#21 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/CancellationTokenSource.php(161): class@anonymous->Amp\{closure}(Object(Amp\CancelledException))
#22 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/TimeoutCancellationToken.php(30): Amp\CancellationTokenSource->cancel(Object(Amp\TimeoutException))
#23 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Loop/NativeDriver.php(111): Amp\TimeoutCancellationToken::Amp\{closure}('bnq', NULL)
#24 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Loop/Driver.php(138): Amp\Loop\NativeDriver->dispatch(true)
#25 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Loop/Driver.php(72): Amp\Loop\Driver->tick()
#26 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Loop.php(95): Amp\Loop\Driver->run()
#27 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/src/Async/AsyncFactory.php(93): Amp\Loop::run(Object(Closure))
#28 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/src/Async/AsyncFactory.php(97): Blended\hostlib\Async\AsyncFactory->process(Array)
#29 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/src/Backend.php(666): Blended\hostlib\Async\AsyncFactory->collect(Array)
I believe zombie process issues were fixed by https://github.com/amphp/process/commit/8c769ff2b3ed7f507640f0df25409ef97a3563a8 and https://github.com/amphp/process/commit/76e9495fd6818b43a20167cb11d8a67f7744ee0f some time ago.