swoole-src icon indicating copy to clipboard operation
swoole-src copied to clipboard

Question: custom dispatch connection close

Open sshymko opened this issue 6 years ago • 4 comments

Imagine a custom dispatch function based on an HTTP request message, such as request path, host, or cookies, etc. It would extract a request identifier from the request message and resolve it to a worker ID via the standard formula crc32($requestId) % $server->setting['worker_num'].

Naive implementation of dispatch by URL path:

$dispatcher = function (\Swoole\Server $server, $fd, $type, $data) {
    if (preg_match('/^GET\s+(?<path>[^\s]+)/', $data, $matches)) {
        $requestId = crc32($matches['path']);
    } else {
        $requestId = $fd - 1;
    }
    $workerId = $requestId % $server->setting['worker_num'];
    return $workerId;
};

$server = new \Swoole\Http\Server('127.0.0.1', 8080);
$server->set(['dispatch_func' => $dispatcher]);
$server->on('request', function ($request, $response) use ($server) {
    $response->header('Content-Type', 'text/plain');
    $response->end("Served by worker {$server->worker_id}\n");
});
$server->start();

There's a problem with the straightforward implementation - the dispatcher function is called 2 times for a single request:

  1. Data fetch $type == 10
  2. Connection close $type == 4

Unlike the 1st call, the 2nd call does not have access to the request data, so it cannot perform the same worker ID resolution as the 1st call to dispatch to the same worker process.

Revised implementation caching result of the 1st call:

class UrlPathDispatcher
{
    protected $dispatchMap = [];

    public function __invoke(\Swoole\Server $server, $fd, $type, $data)
    {
        if (isset($this->dispatchMap[$fd])) {
            $workerId = $this->dispatchMap[$fd];
        } else {
            $workerId = $this->dispatch($server, $fd, $type, $data);
            $this->dispatchMap[$fd] = $workerId;
        }
        if ($type == 4/*connection close*/) {
            unset($this->dispatchMap[$fd]);
        }
        return $workerId;
    }

    protected function dispatch(\Swoole\Server $server, $fd, $type, $data)
    {
        if (preg_match('/^GET\s+(?<path>[^\s]+)/', $data, $matches)) {
            $requestId = crc32($matches['path']);
        } else {
            $requestId = $fd - 1;
        }
        $workerId = $requestId % $server->setting['worker_num'];
        return $workerId;
    }
}

$dispatcher = new UrlPathDispatcher();

The implementation has become considerably more complicated and it now requires to maintain the state, so it cannot be a simple function anymore. It would be great to avoid overcomplicating the implementation of dispatchers by freeing them from handling the connection closure.

Conceptual questions:

  1. Do dispatch functions need to resolve all calls of the same request to the same worker ID?
  2. What would happen if the dispatch function returns different worker ID for connection close?
  3. Is the revised implementation above conceptually correct, particularly use of $fd as cache key?
  4. What is the recommended implementation approach?

Thanks in advance for responding to this ticket which is a question rather than an issue.

sshymko avatar Mar 22 '19 05:03 sshymko

It is recommended to use the second dispatcher, otherwise problems may occur

windrunner414 avatar Mar 22 '19 07:03 windrunner414

The approach of caching the dispatch results by connection ID $fd demonstrated above does NOT even allow to implement an algorithm as simple as Round-Robin. Multiple requests of the same connection will end up being routed to the same worker ID rather than all workers sequentially. There seems to be nothing else besides $fd that can be used as a cache key.

The best implementation of the Round-Robin I could come up with does not seem reliable when the next request comes in before the previous one finishes and closes the connection: https://github.com/upscalesoftware/swoole-dispatch/blob/master/src/RoundRobin.php

sshymko avatar Mar 22 '19 16:03 sshymko

The fallback dispatch by the Client ID ($fd - 1) % $server->setting['worker_num'] is emulation of the built-in fixed client mode dispatch_mode = 2. The problem of duplicating the dispatch modes is raised in #2451.

sshymko avatar Mar 24 '19 06:03 sshymko

Hm, multiple requests can be handled through the same persistent connection. So, there will be multiple calls to the dispatch function with request data ($type == 10) before the connection close call ($type == 4). All those requests can be dispatched to different workers, such as via Round-Robin, cannot they?

@twose, @matyhtf, Does it even matter what worker performs the connection close? It appears that any worker can close the connection correctly. Is the complicated dispatch logic shown above unnecessary?

sshymko avatar Mar 24 '19 21:03 sshymko