seastar icon indicating copy to clipboard operation
seastar copied to clipboard

websocket: Fix websocket data being modelled as a stream internally

Open p12tic opened this issue 9 months ago • 3 comments
trafficstars

Currently WebSocket implementation has the following input data path:

input_stream -> websocket parser -> queue -> data_sink -> input_stream wrapper

On the output side, the data path is as follows:

output_stream wrapper -> data_source -> queue -> websocket serializer -> output_stream

The input_stream and output_stream wrappers are what is exposed to the user. This is problematic, because WebSocket is a message-based protocol and streams do not have the concept of messages, they model data as a stream of bytes. As a result, the payloads that the user sends or receives will not correspond to a single websocket message, breaking the underlying protocol in most cases. E.g. json or protobuf payloads no longer contain valid data and so on.

Currently this behavior is seen on high load, when the stream wrappers start to coalesce and split messages in the write path due to more than one message being available at a time.

The solution is to expose data_sink and data_source that are backed by message queues directly to the user.

p12tic avatar Feb 06 '25 23:02 p12tic

More specifically, the current code results in problems on high load, when the stream wrappers start to coalesce and split messages in the write path due to more than one message being available at a time.

Why is that a problem? The way it's described here it looks like -- a code sends two messages into output stream, and those two messages are squashed together and are sent out on a wire. It sounds as a nice optimization to me, not a problem

xemul avatar Feb 07 '25 06:02 xemul

Why is that a problem? The way it's described here it looks like -- a code sends two messages into output stream, and those two messages are squashed together and are sent out on a wire. It sounds as a nice optimization to me, not a problem

The problem is that the parts of more than one message can be squashed into one websocket message. This may completely break the protocol of the payload - json or protobuf encoded payloads no longer decode successfully and so on.

I ~will edit~ have edited the commit and PR descriptions to make this clear

p12tic avatar Feb 07 '25 10:02 p12tic

The problem is that the parts of more than one message can be squashed into one websocket message. This may completely break the protocol of the payload - json or protobuf encoded payloads no longer decode successfully and so on.

Would be nice to see the reproducer (read: a unit test that fails) of that

xemul avatar Feb 10 '25 14:02 xemul