beast icon indicating copy to clipboard operation
beast copied to clipboard

inactivity_timeout is triggered regardless of having received pings

Open xim opened this issue 1 year ago • 13 comments

Version of Beast

347

Steps necessary to reproduce the problem

I'm using plain, async boost::beast::websocket for a server and a client.

I have a server that's sending ping packets at a 15s interval and keeping the connection alive correctly. I have verified that the client receives these and responds with pongs.

Since I have these ping packets going, I've added a timeout in the client to try to detect if the connection goes down:

ws_.set_option(boost::beast::websocket::stream_base::timeout{
    60s,
    30s,
    false
});

I had assumed that this would be kept alive by the ping packets sent from the server, but it isn't. The client disconnects with the message The socket was closed due to a timeout.

Setting the final flag (keep_alive_pings) to true makes the client keep the connection alive by doing the same as the server's already doing. Given the server set to send ping frames at 15s intervals, this setting on the client:

ws_.set_option(boost::beast::websocket::stream_base::timeout{
    60s,
    80s,
    true
});

Means I'll get this behaviour:

00:00 Client connects
00:15 Server pings as expected
00:15 Client pongs as required
00:30 Server pings as expected
00:30 Client pongs as required
00:40 Client pings despite having seen activity in the form of a ping

I feel is the desired behaviour would be to count incoming ping frames as activity. If that can't be made the default behaviour, could we try to add it as a config on boost::beast::websocket::stream_base::timeout?

xim avatar Aug 10 '23 20:08 xim

I made a somewhat-minimal reproduction at https://gist.github.com/xim/06d60c455363fbb764c14f21d7705c40

Compiled and run by running each in a separate terminal:

g++ -g -std=gnu++17 websocket_server_async.cpp && echo running && ./a.out 127.0.0.1 8080 1
g++ -g -std=gnu++17 websocket_client_async.cpp && ./a.out 127.0.0.1 8080 'test'

xim avatar Aug 10 '23 21:08 xim

What are you using as the tcp stream? Are you calling async_read perpetually?

klemens-morgenstern avatar Aug 11 '23 00:08 klemens-morgenstern

As shown in the gist, I made a somewhat-minimal reproduction with the example websocket_client_async.cpp as a base.

It's a websocket::stream<beast::tcp_stream> and I re-call async_read in the on_read handler

xim avatar Aug 11 '23 04:08 xim

Full code instead of patch at https://gist.github.com/xim/97874f6d7e27402f26228a0da6448236

Here, the server sends pings ever 1s and the client has a 5s timeout.

Client output:

ping
ping
ping
ping
read: The socket was closed due to a timeout

Server output:

pong
pong
pong
pong
read: End of file

xim avatar Aug 11 '23 07:08 xim

Made a PR. If you want this to be configurable, I can have a stab at that as well.

xim avatar Aug 11 '23 12:08 xim

I couldn't find any indication in RFC 6455 that either permits or prohibits the counting of control frames as a form of activity. Considering that this approach could potentially eliminate the need for extra ping-pong messages every n seconds for clients with enabled keep_alive_pings, introducing an additional parameter to the configuration could unnecessarily complicate matters for a feature that might see very limited use.

The only plausible options appear to be either counting pings as activity (assuming the standard allows for it), or maintaining the status quo. Users could opt to utilize the low-level API and manually implement ping-pong if they wish to minimize the transmission of extra ping-pong messages.

ashtum avatar Aug 19 '23 08:08 ashtum

I couldn't find any indication in RFC 6455 that either permits or prohibits the counting of control frames as a form of activity

Agree, the most precise is the phrasing A Ping frame may serve either as a keepalive or as a means to verify that the remote endpoint is still responsive.

IMHO it makes sense for control frames to count as activity.

this approach could potentially eliminate the need for extra ping-pong messages every n seconds for clients with enabled keep_alive_pings

Exactly, good summary there.

The only plausible options appear to be either counting pings as activity (assuming the standard allows for it), or maintaining the status quo

My vote is on the former =)

xim avatar Aug 25 '23 06:08 xim

Without thinking about it too much, I believe received pings should reset the timer, as they represent activity.

Regardless of what is changed, the library documentation should be updated to describe either the current or the new behavior of received pings with respect to timeouts if it is not already explained.

vinniefalco avatar Aug 25 '23 17:08 vinniefalco

assuming the standard allows for it

The specifications say nothing about what counts as idle activity. Ideally Beast would have great defaults that work for most people, while also allowing for the caller to have perfect control over customizing that behavior. However Beast has reaching close to the end of what can be done without incurring technical debt, so this ideal will likely only be approached in a new websocket library which will replace beast (this work has not started yet).

vinniefalco avatar Aug 25 '23 17:08 vinniefalco

Without thinking about it too much, I believe received pings should reset the timer, as they represent activity.

Does that mean we should proceed towards merging the PR, @vinniefalco? I can make additional adjustments or try to document more inline if you wish!

xim avatar Aug 29 '23 12:08 xim

@xim It would be great if you could add the relevant test cases to the PR; This is the appropriate location for including them: https://github.com/boostorg/beast/blob/develop/test/beast/websocket/ping.cpp.

ashtum avatar Aug 29 '23 13:08 ashtum

I don't see how I can test this without actually waiting for timers to expire. I mean, this test passes, but sleeping in the test like this is probably not OK?

echo_server es{log};
error_code ec;
stream<test::stream> ws{ioc_};
ws.set_option(stream_base::timeout{
    stream_base::none(),
    std::chrono::milliseconds(100),
    false
});
ws.next_layer().connect(es.stream());
ws.handshake("localhost", "/");
flat_buffer b;
bool got_timeout = false;
ws.async_read(b,
    [&](error_code ec, std::size_t)
    {
        std::cerr << ec.message() << std::endl;
        if(ec != beast::error::timeout)
            BOOST_THROW_EXCEPTION(
                system_error{ec});
        got_timeout = true;
    });
BEAST_EXPECT(ws.impl_->idle_counter == 0);
ioc_.run_for(std::chrono::milliseconds(60));
ioc_.restart();
BEAST_EXPECT(ws.impl_->idle_counter == 1);
ws.ping("", ec);
if(ec)
    BOOST_THROW_EXCEPTION(
        system_error{ec});
ioc_.run_for(std::chrono::milliseconds(10));
ioc_.restart();
BEAST_EXPECT(ws.impl_->idle_counter == 0);
BEAST_EXPECT(!got_timeout);
ioc_.run_for(std::chrono::milliseconds(100));
ioc_.restart();
BEAST_EXPECT(got_timeout);

xim avatar Aug 30 '23 08:08 xim

It seems this how is how timeout cases are tested in the entire code base (searching for stream_base::timeout or test::run_for). Please ensure that you allocate a sufficient margin to prevent false failures caused by OS rescheduling; examining other test cases can provide a reference for determining the appropriate margin. @klemens-morgenstern is this the right way?

ashtum avatar Aug 30 '23 08:08 ashtum