doc-en icon indicating copy to clipboard operation
doc-en copied to clipboard

stream_copy_to_stream() with sockets insufficiently documented

Open mikhainin opened this issue 3 years ago • 4 comments

Description

The code is a little big, so let me explain first.

If the source socket was created as blocking, stream_copy_to_stream returns as soon as default_socket_timeout is reached, before the actual EOF happened.

That's probably the expected behaviour but we couldn't find this in the documentation, maybe that should be improved. I mean, the documentation doesn't mention that stream_copy_to_stream can return because of timeout which may cause a perception that it returns on one of 3 conditions:

  • stream is over (EOF)
  • max length is reached
  • error

None of these conditions actually happens in our case

The behaviour changed recently if the source socket was created from stream_socket_accept on a non-blocking socket on Linux (it looks related to php/php-src#8472, php/php-src#9252).

The following code:

<?php
const SEND_BYTES = 10 * 1024;

if (pcntl_fork() == 0) {
    doSender();
}

doServer();
pcntl_wait($status); // wait for the sender process

function doSender() {
    $socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
    socket_connect($socket, '127.0.0.1', 9638);

    $string = str_repeat("0", SEND_BYTES);
    $written = 0;

    for ($i = 0; $i < 2; ++ $i) {
        $bytes = socket_write($socket, $string . "\n");
        assert($bytes !== false, "server died");

        $written += $bytes;

        sleep((int)ini_get('default_socket_timeout') * 2);
    }

    printf("written: %d\n", $written);
    exit;
}


function doServer() {
    $server = new Server;
    $server->doServer();
}

class Server
{
    private $socket_server;

    function doServer()
    {
        $this->socket_server = $this->startSocketServer();

        while (true) {
            //stream_socket_accept emits a warning if interrupted by a signal
            if (@$client = stream_socket_accept($this->socket_server, 0)) {
                $this->handleClient($client);
                break;
            }

            usleep(10000); //0.01 sec
        }

        fclose($this->socket_server);
    }

    function startSocketServer()
    {
        $socket_server = stream_socket_server("tcp://0.0.0.0:9638", $errno, $errstr);
        assert(!empty($socket_server), "Could not start socket server [$errno]: $errstr");

        if (!stream_set_blocking($socket_server, false)) {
            assert(false, "Could not set socket server to non-blocking mode");
        }

        return $socket_server;
    }

    protected function handleClient($client)
    {
        $handle = fopen("/dev/null", 'w');
        assert(!empty($handle), "Could not open client output stream for write");

        $bytes_copied = stream_copy_to_stream($client, $handle);
        printf("read %d bytes\n", $bytes_copied);

        fclose($handle);
        fclose($client);
    }
}

Resulted in this output:

$ php -ddefault_socket_timeout=2 socket-test.php
read 10241 bytes
written: 20482

But I expected this output instead (PHP 8.0.22 returned):

$ php -ddefault_socket_timeout=2 socket-test.php
written: 20482
read 20482 bytes

PHP Version

PHP 8.0.25

Operating System

CentOS Stream 8

mikhainin avatar Nov 23 '22 17:11 mikhainin

Indeed, the behavioral change is due to fixing php/php-src#8472. @bukka, could you please have a look?

cmb69 avatar Nov 24 '22 12:11 cmb69

The change of the behavior is actually a bug fix because previously we just assumed that accept inherits O_NONBLOCK but that's not the case on Linux so we ended up with blocking socket to be treated as non blocking.

I did some investigation and this is exactly what was happening in your example. As you noticed the blocking stream always considers timeout unless it is explicitly set to infinite (-1). That's a prevention to block the script forever by default - something that most users don't like. Obviously the provided examples expected blocking behavior for $client here so the timeout should be applied here as well because that would be the case for any other blocking stream. However due to the mentioned bug the stream was marked as non blocking but internally the socket was still blocking. What happened is that streams internal logic treated the read as it would be non blocking but it actually blocked because the incorrect marking. Such blocking doesn't respect any timeout because it's not expected to happen in the first place. Metadata also suggested that the stream is non blocking and users could think that it won't block so we can leave it like that as it is obviously a bug to pretend that something doesn't block but it actually blocks. I hope you see how wrong was the previous behavior and why we needed to fix it.

If you want the previous behavior, you should just set the timeout to infinite. You can change either the default value if you want it for all stream operation or you can change it directly on the stream by stream_set_timeout. Or if you want to make it truly non blocking, mark the $client as non blocking using stream_set_blocking and then use stream_select before copying (in the loop).

One thing that I should note is that stream_socket_accept non blocking behavior is not the same on all platforms. It is actually inherited on other platforms so for example on Mac / FreeBSD / Windows, you would get non blocking behavior. So it's better to explicitly set it as blocking or non blocking (depending on your approach) if you want your code to work everywhere in the same way.

bukka avatar Nov 25 '22 11:11 bukka

Hey @bukka, I also had quite a comprehensive investigation of those fixes and this problem. Generally, I tend to agree that sockets should never be used without timeouts -- even when users want this, it is usually a wrong decision.

Basically, I wanted you to confirm that it is an expected behaviour.

However, there is a second part of the issue. Looking at the documentation for stream_copy_to_stream, this behaviour looks really surprising - I asked 3 people and they didn't expect that timeout to apply there :)

Just wanted to ask how you feel about the possibility to extend the doc to make an explicit statement about timeouts?

mikhainin avatar Nov 25 '22 12:11 mikhainin

I think it would be a good idea to extend docs and clarify what stream_copy_to_stream really does. There are some difference between copying files and sockets. The files (stdio) is quite straight forward from the user point of view but I agree that sockets are not really clearly documented so users don't know what to expect. It's pretty much a wrapper around read and write - basically this https://github.com/php/php-src/blob/d526773d20ca091201685c904246182cc313a7b1/main/streams/streams.c#L1680-L1713 . So if we could clarify that, it would be useful IMO. Feel free to create a PR in doc-en repo and add me a reviewer.

I'm going to close this as this behavior is expected and there's nothing to do in core code.

bukka avatar Nov 25 '22 13:11 bukka