parallel icon indicating copy to clipboard operation
parallel copied to clipboard

Parallel processing in loop for more than 200 iteration causing zombie processes

Open nkmaurya-dev opened this issue 5 years ago • 6 comments

I'm using amphp/parallel to download a number of image files parallel, so that it could take less time as compare to normal processing. It is seen that, it is working fine if there is request <= 200, all files are being downloaded successfully within few seconds. The issue is that, if request > 200, it gets stuck and it is seen that there are 200 zombie processes are created in process table list. It can be seen using Linux top command. However, in this case 200 files are able to download successfully.

/**
 * $items An array of files to download
 */
Loop::run(function () use ($items) {
            try {
                $processArray = array();

                for ($counter = 0; $counter <= count($items); $counter++) {
                    // Create a new child process that does some blocking stuff.
                    $context = yield Process::run(__DIR__ . "/Blocking.php");

                    \assert($context instanceof Process);

                    array_push($processArray, $context);

                    // Pipe any data written to the STDOUT in the child process to STDOUT of this process.
                    rethrow(ByteStream\pipe($context->getStdout(), ByteStream\getStdout()));

                    yield $context->send($items[$counter]); 
                }

                foreach ($processArray as $p) {
                    yield $p->receive(); // message from child;
                    yield $p->join(); // Terminate the child process
                }
            } finally {
                
            }
        });

Blocking.php

return function (Channel $channel): \Generator {
    $data = yield $channel->receive();
    file_put_contents($data['path'], file_get_contents($data['url']));
    yield $channel->send("");
    return true;
};

Due to zombie processes still in process table, there is no response further for new request. It needs to restart the Apache server and it clears the zombie processes. Proof: https://drive.google.com/file/d/1Z4WMGKA_56QaBMdZ6wXkyx7ilBPYYSkw/view

I do not know why zombie processes are creating. Is there implementation issue of parallel processing in my code? Are processes are not exiting correctly? Is their any Apache limitation of process handling per request?

nkmaurya-dev avatar Mar 04 '20 09:03 nkmaurya-dev

It appears xdebug was enabled. Please disable xdebug and let me know if the problem persists.

trowski avatar Jul 16 '20 14:07 trowski

I have disabled the xdebug, but there is no success. Got the same issue again.

nkmaurya-dev avatar Jan 07 '21 19:01 nkmaurya-dev

If you're just downloading files, I strongly recommend using https://github.com/amphp/http-client instead, which doesn't require multiple processes. It'll be much more efficient and performant.

Zombie processes might be due to open file limits kicking in and might only be a follow-up error.

kelunik avatar Jan 07 '21 21:01 kelunik

As per your suggestion, I'm modifying the code be like this

Loop::run(static function () use ($items): \Generator {

        // Instantiate the HTTP client
        $client = HttpClientBuilder::buildDefault();

        $requestHandler = static function (string $uri) use ($client): \Generator {
            /** @var Response $response */
            $response = yield $client->request(new Request($uri));
            return yield $response->getBody()->buffer();
        };

        try {
            $promises = [];

            foreach ($items as $item) {
                $promises[$item['url']] = Amp\call($requestHandler, $item['url']);
            }

            $bodies = yield $promises;

            foreach ($bodies as $item => $body) {
                print $uri . " - " . \strlen($body) . " bytes" . PHP_EOL;                    
            }
        } catch (HttpException $error) {
            // If something goes wrong Amp will throw the exception where the promise was yielded.
            // The HttpClient::request() method itself will never throw directly, but returns a promise.
            echo $error;
        }
});

But I'm unable to understand that, where to add the code for saving downloaded file to a specific path as I was doing in the blocking.php earlier.

nkmaurya-dev avatar Jan 07 '21 22:01 nkmaurya-dev

I'll work similar than before. You can also directly stream the response body to disk without buffering it.

Loop::run(static function () use ($items): \Generator {
        $client = HttpClientBuilder::buildDefault();

        $requestHandler = static function (string $job) use ($client): \Generator {
            /** @var Response $response */
            $response = yield $client->request(new Request($job['url']));
            yield Amp\ByteStream\pipe($response->getBody(), yield Amp\File\open($job['path'], 'w'));
        };

        try {
            $promises = [];

            foreach ($items as $item) {
                $promises[$item['url']] = Amp\call($requestHandler, $item);
            }

            yield $promises;
        } catch (HttpException $error) {
            echo $error;
        }
});

kelunik avatar Jan 07 '21 22:01 kelunik

Wow, I'm able to download 284 files in a moment. There are some changes in the code as per the issues found.

Loop::run(static function () use ($items): \Generator {
        // Instantiate the HTTP client
        $client = HttpClientBuilder::buildDefault();

        $requestHandler = static function (Array $job) use ($client): \Generator {
            /** @var Response $response */
            $part_file = $job['part'];
            $response = yield $client->request(new Request($job['url']));
            yield ByteStream\pipe($response->getBody(), yield  File\openFile($job['path'], 'w'));
            if (file_exists($part_file)) {
                unlink($part_file);
            }
        };
        try {
            $promises = [];
            foreach ($items as $item) {
                $promises[$item['url']] = call($requestHandler, $item);
            }
            yield $promises;
        } catch (HttpException $error) {
            echo $error;
        }
});

But at the end there is an error thrown which is as follows.

Amp\Http\Client\TimeoutException: Allowed transfer timeout exceeded, took longer than 10000 ms in /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/http-client/src/Connection/Http1Connection.php:443
Stack trace:
#0 [internal function]: Amp\Http\Client\Connection\Http1Connection->Amp\Http\Client\Connection\{closure}()
#1 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Coroutine.php(118): Generator->send(NULL)
#2 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Internal/Placeholder.php(46): Amp\Coroutine->Amp\{closure}(NULL, NULL)
#3 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Internal/PrivatePromise.php(23): class@anonymous->onResolve(Object(Closure))
#4 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Internal/Placeholder.php(143): Amp\Internal\PrivatePromise->onResolve(Object(Closure))
#5 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Deferred.php(52): class@anonymous->resolve(Object(Amp\Internal\PrivatePromise))
#6 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/functions.php(282): Amp\Deferred->resolve(Object(Amp\Internal\PrivatePromise))
#7 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Internal/Placeholder.php(149): Amp\Promise\{closure}(NULL, NULL)
#8 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Deferred.php(52): class@anonymous->resolve(NULL)
#9 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/byte-stream/lib/ResourceInputStream.php(198): Amp\Deferred->resolve()
#10 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/byte-stream/lib/ResourceInputStream.php(182): Amp\ByteStream\ResourceInputStream->free()
#11 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/socket/src/ResourceSocket.php(166): Amp\ByteStream\ResourceInputStream->close()
#12 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/http-client/src/Connection/Http1Connection.php(122): Amp\Socket\ResourceSocket->close()
#13 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/functions.php(90): Amp\Http\Client\Connection\Http1Connection->close(Object(Amp\CancelledException))
#14 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/functions.php(121): Amp\call(Array, Object(Amp\CancelledException))
#15 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/CombinedCancellationToken.php(29): Amp\asyncCall(Array, Object(Amp\CancelledException))
#16 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/functions.php(90): Amp\CombinedCancellationToken->Amp\{closure}(Object(Amp\CancelledException))
#17 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/functions.php(121): Amp\call(Object(Closure), Object(Amp\CancelledException))
#18 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/CombinedCancellationToken.php(29): Amp\asyncCall(Object(Closure), Object(Amp\CancelledException))
#19 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/CancellationTokenSource.php(92): Amp\CombinedCancellationToken->Amp\{closure}(Object(Amp\CancelledException))
#20 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/CancellationTokenSource.php(77): class@anonymous->invokeCallback(Object(Closure))
#21 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/CancellationTokenSource.php(161): class@anonymous->Amp\{closure}(Object(Amp\CancelledException))
#22 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/TimeoutCancellationToken.php(30): Amp\CancellationTokenSource->cancel(Object(Amp\TimeoutException))
#23 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Loop/NativeDriver.php(111): Amp\TimeoutCancellationToken::Amp\{closure}('bnq', NULL)
#24 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Loop/Driver.php(138): Amp\Loop\NativeDriver->dispatch(true)
#25 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Loop/Driver.php(72): Amp\Loop\Driver->tick()
#26 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/vendor/amphp/amp/lib/Loop.php(95): Amp\Loop\Driver->run()
#27 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/src/Async/AsyncFactory.php(93): Amp\Loop::run(Object(Closure))
#28 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/src/Async/AsyncFactory.php(97): Blended\hostlib\Async\AsyncFactory->process(Array)
#29 /var/www/html/wp-async/wp-content/themes/blended_fw/hostlib/src/Backend.php(666): Blended\hostlib\Async\AsyncFactory->collect(Array)

nkmaurya-dev avatar Jan 07 '21 23:01 nkmaurya-dev

I believe zombie process issues were fixed by https://github.com/amphp/process/commit/8c769ff2b3ed7f507640f0df25409ef97a3563a8 and https://github.com/amphp/process/commit/76e9495fd6818b43a20167cb11d8a67f7744ee0f some time ago.

trowski avatar Dec 30 '22 01:12 trowski