beast icon indicating copy to clipboard operation
beast copied to clipboard

websocket and prioritised handlers

Open shuras109 opened this issue 3 years ago • 7 comments

Hi there! I've tried to use priority queue (get it straight from https://www.boost.org/doc/libs/1_77_0/doc/html/boost_asio/example/cpp03/invocation/prioritised_handlers.cpp example) with beast websocket and mostly it worked fine. But under stress-testing with more than 20 thousand of connections I've start to see a problem which is not present if you are running without prioritised handlers queue. Each of websockets in my stress-test scenario loaded with binary frames coming in and out at a moderate speed and there is also occasional ping-pong exchanges at a slow steady pace. My problem is that somehow some of connections become stuck. Async read just never returns and even close of low level socket doesn't force this handler to finish. I've put some time in finding condition for such event and as it turns out there is always a packet received with two ws frames inside: binary and ping. I'm not sure yet if this packet created in such form originally or concatenated by network adapter offload mechanism. But in every case of such stuck ws connection I've seen was preceded by a packet with binary and ping frames inside. Is there a problem inside websockeet implementation? Should it be compatible with prioritised handlers queue? Or should I stay away from it?

shuras109 avatar Sep 23 '21 13:09 shuras109

I don't think it should be a problem that more than one frame has been sent in the same TCP message. We treat the underlying socket as a stream, so we have no concept of TCP message boundaries in the websocket stream code (the websocket stream would work if connected to file handle or pipe, for example).

  1. I would like to be able to replicate your test. Would you be in a position to share the steps to create the test client and share the (complete) code that is failing?
  2. I would like a copy of the contents of the network packet in which there is both the binary frame and the ping, so that I can build a test to check the interaction of the parser and the transport layer.

What you are doing sounds more complex than this example, and it might be that you have a race condition.

Anecdotally, I have tested beast in a production server while handling 80,000 concurrent connections. That server has been running for three years. I suspect that if there was an inherent logic error, I would have seen it, but it is always possible that we are doing something different.

madmongo1 avatar Sep 23 '21 13:09 madmongo1

Error only pops out for me after addition of priority queue. With plain io context there is no problem. I will try to create simple test basing on one of echo server tests. And I will provide pcap dump in a few minutes.

shuras109 avatar Sep 23 '21 14:09 shuras109

Dump of stream with a problem. Look for a packet No 652. ws.error.pcap.gz

shuras109 avatar Sep 23 '21 15:09 shuras109

Dump of stream with a problem. Look for a packet No 652. ws.error.pcap.gz

I see only one frame in that tcp message though, and we sent back a pong.

madmongo1 avatar Sep 23 '21 17:09 madmongo1

I've tried to replicate my problem in a simple test and it doesn't show up in this tests. I will go deeper in my code look for error of my own. Luckily I can replicate error with just two of concurrent sessions.

shuras109 avatar Sep 27 '21 12:09 shuras109

Great. This is often the case. The most common violations of asio/beast preconditions are:

  • not protecting a connection with a strand when there are more than one thread, resulting in data races.
  • initiating more than one async_write per stream at a time.
  • Invoking methods on a stream from one thread while there is an async operation making progress on another.
  • Not ensuring that the buffers or stream itself outlives all async operations initiated by the stream.

madmongo1 avatar Sep 28 '21 06:09 madmongo1

This issue has been open for a while with no activity, has it been resolved?

stale[bot] avatar Jan 09 '22 03:01 stale[bot]

@shuras109 what's the state of this issue?

klemens-morgenstern avatar Sep 24 '22 05:09 klemens-morgenstern