swoole-src icon indicating copy to clipboard operation
swoole-src copied to clipboard

After update from swoole 5.1.3 to swoole 5.1.4 server starts to randomly respond 503 status

Open volodymyr-hordiienko opened this issue 1 year ago • 6 comments

After update from swoole 5.1.3 to swoole 5.1.4 server starts to randomly respond 503 status and after heavy load starts to respond 503 everytime even there is all good with resources.

Downgrade to swoole 5.1.3 fix the problem.

Projects uses hyperf and runs on hyperf/hyperf:8.3-alpine-v3.19-swoole-v5.1.3, upgrade was on hyperf/hyperf:8.3-alpine-v3.20-swoole-v5.1.4

volodymyr-hordiienko avatar Oct 03 '24 13:10 volodymyr-hordiienko

Hello, do you use curl and coroutines? It may be related to curl, as alpine 3.20 switched to curl 8.10.1 and this was already reported https://github.com/curl/curl/issues/15127. Alpine 3.19 ships an older curl version.

hydrapolic avatar Oct 03 '24 21:10 hydrapolic

@hydrapolic

Thanks a lot for the reply and the direction!

Yeah, actually i used Hyperf\Guzzle client a lot with coroutines, not sure it uses exactly curl under, but its possible. Strange its happening silently, just 503 on all http requests, even requests that not use guzzle. after server restarts it works some time well, but more and more 503 happening untill all server stucks with 503... in the app logs there no issues, amqp consumers works, background tasks also etc, just http request affected.

Well, is it means we need to wait till swoole supports curl 8.10.1 before alpine upgrade?

I know i need to provide more details, but i have no any logs with errors... just do not know what info could be provided here :) could not provide code that 100% reproduce issue, it reproduces only on huge amounts of requests, locally works fine

volodymyr-hordiienko avatar Oct 03 '24 21:10 volodymyr-hordiienko

This seems to be an issue caused by the last PR related to process restart and resetting some global variables. Please wait a moment, and I will fix it shortly.

NathanFreeman avatar Oct 04 '24 00:10 NathanFreeman

Has the server been configured with the max_concurrency option ? @volodymyr-hordiienko

matyhtf avatar Oct 04 '24 01:10 matyhtf

@hydrapolic

Thanks a lot for the reply and the direction!

Yeah, actually i used Hyperf\Guzzle client a lot with coroutines, not sure it uses exactly curl under, but its possible. Strange its happening silently, just 503 on all http requests, even requests that not use guzzle. after server restarts it works some time well, but more and more 503 happening untill all server stucks with 503... in the app logs there no issues, amqp consumers works, background tasks also etc, just http request affected.

Well, is it means we need to wait till swoole supports curl 8.10.1 before alpine upgrade?

I know i need to provide more details, but i have no any logs with errors... just do not know what info could be provided here :) could not provide code that 100% reproduce issue, it reproduces only on huge amounts of requests, locally works fine

Actually we discovered this on OpenSwoole (v22.1.2). Our CI pipeline started to fail 1.10.2024 with signal 11 kills, it was the time when newer curl was added to Alpine: https://gitlab.alpinelinux.org/alpine/aports/-/commit/e17ad295e0810e938f9c4d61c21e44eaf1a59b51 https://gitlab.alpinelinux.org/alpine/aports/-/commit/badfda8e31a0b6e87e78237d7ee4455b35cd4c10

After switching back to Alpine 3.19 and Curl 8.9.1, the segfaults were gone instantly.

hydrapolic avatar Oct 04 '24 06:10 hydrapolic

@matyhtf No, its not, here is my server config


declare(strict_types=1);
/**
 * This file is part of Hyperf.
 *
 * @link     https://www.hyperf.io
 * @document https://hyperf.wiki
 * @contact  [email protected]
 * @license  https://github.com/hyperf/hyperf/blob/master/LICENSE
 */
use Hyperf\Server\Event;
use Hyperf\Server\Server;
use Swoole\Constant;
return [
    'mode' => SWOOLE_PROCESS,
    'servers' => [
        [
            'name' => 'http',
            'type' => Server::SERVER_HTTP,
            'host' => '0.0.0.0',
            'port' => 80,
            'sock_type' => SWOOLE_SOCK_TCP,
            'callbacks' => [
                Event::ON_REQUEST => [Hyperf\HttpServer\Server::class, 'onRequest'],
            ],
            'options' => [
                'enable_request_lifecycle' => true,
            ],
        ],
        [
            'name' => 'socket-io',
            'type' => Server::SERVER_WEBSOCKET,
            'host' => '0.0.0.0',
            'port' => 8001,
            'sock_type' => SWOOLE_SOCK_TCP,
            'callbacks' => [
                Event::ON_HAND_SHAKE => [Hyperf\WebSocketServer\Server::class, 'onHandShake'],
                Event::ON_MESSAGE => [Hyperf\WebSocketServer\Server::class, 'onMessage'],
                Event::ON_CLOSE => [Hyperf\WebSocketServer\Server::class, 'onClose'],
            ],
        ],
    ],
    'settings' => [
        Constant::OPTION_LOG_LEVEL => 4,
        Constant::OPTION_ENABLE_COROUTINE => true,
        Constant::OPTION_WORKER_NUM => swoole_cpu_num(),
        Constant::OPTION_PID_FILE => BASE_PATH . '/runtime/hyperf.pid',
        Constant::OPTION_OPEN_TCP_NODELAY => true,
        Constant::OPTION_MAX_COROUTINE => 100000,
        Constant::OPTION_OPEN_HTTP2_PROTOCOL => true,
        Constant::OPTION_MAX_REQUEST => 100000,
        Constant::OPTION_SOCKET_BUFFER_SIZE => 2 * 1024 * 1024,
        Constant::OPTION_BUFFER_OUTPUT_SIZE => 2 * 1024 * 1024,
        Constant::OPTION_DOCUMENT_ROOT => BASE_PATH . '/storage',
        Constant::OPTION_STATIC_HANDLER_LOCATIONS => ['/public'],
        Constant::OPTION_ENABLE_STATIC_HANDLER => true,
        Constant::OPTION_HTTP_COMPRESSION => true,
        Constant::OPTION_HTTP_COMPRESSION_LEVEL => 4,
    ],
    'callbacks' => [
        Event::ON_WORKER_START => [Hyperf\Framework\Bootstrap\WorkerStartCallback::class, 'onWorkerStart'],
        Event::ON_PIPE_MESSAGE => [Hyperf\Framework\Bootstrap\PipeMessageCallback::class, 'onPipeMessage'],
        Event::ON_WORKER_EXIT => [Hyperf\Framework\Bootstrap\WorkerExitCallback::class, 'onWorkerExit'],
    ],
];

volodymyr-hordiienko avatar Oct 04 '24 06:10 volodymyr-hordiienko

@hydrapolic https://github.com/curl/curl/pull/15206

The official curl repository has resolved this bug.

matyhtf avatar Oct 21 '24 10:10 matyhtf

Version 5.1.4 attempted to rectify the current concurrent counter after a crash or a fatal PHP error. However, the code had a race condition in data reading and writing, leading to an overflow of the concurrent counter, resulting in its value being improperly altered to UINT_MAX. This triggered the max concurrency limit and caused a 503 error. In the latest code, this section has been refactored, and this bug will be fixed in version 5.1.5.

matyhtf avatar Oct 21 '24 10:10 matyhtf