wslay icon indicating copy to clipboard operation
wslay copied to clipboard

WSLAY_MSG_MORE annoyance

Open pps83rbx opened this issue 5 years ago • 8 comments

First of all, I don't understand what was the point to send 6-byte header followed by the message body for each websocket message to be sent. This will result in multiple tcp messages (most users of websocket probably would have TCP_NODELAY enabled).

Not only this thing is annoying to handle, it's also 1) not clear how to handle it properly, and 2) it's doesn't really do what it's supposed to do.

Regarding 1)

Imagine that I want to buffer sends manually when flags & WSLAY_MSG_MORE. In my case with simple text message I get two callbacks for 6 bytes and then for 30 bytes. When I get first 6 bytes, I copy them to my buffer and return from callback 6, when I get second chunk with flags=0 I append it to my buffer and try to send all 36 accumulated bytes in a single tcp send. If all bytes get send I return 30. This part is more or less obvious, but it's totally not clear what to do in case it my second buffer gets sent only partially (imagine that tcp send call sends only 29 or 31 bytes out of 36 requested; thus I'm left with 7 or 5 bytes in my own buffer). What should I return back to wslay in this case? I still return 30, and do wslay_event_set_error(ctx, WSLAY_ERR_WOULDBLOCK); Is that correct?.. but then how will wslay unblock or make me send the remaining 7 or 5 bytes from my own buffer if wslay itself doesn't have anything else pending to be sent?.. will it call me with empty buffer to be sent so that I could flush my own buffer?.. because of this design remote won't receive complete message and will be waiting for the remaining part of the message, isn't it?.. In other words, there is no change a user of the wslay lib could handle that option correctly (unless it simply passes it on to tcp send call). In my case I use curl_easy_send and I do not have an option to pass on WSLAY_MSG_MORE flag.

Regarding 2)

The whole thing with WSLAY_MSG_MORE seems to be poorly implemented and signals only this weird unnecessary fragmentation that wslay does, but DOES NOT signal properly if there is more data to be sent. For example, if I queue a 3 websocket messages, I end up with this weird sequence of callbacks:

send size=6, flags=WSLAY_MSG_MORE
send size=30, flags=0
send size=6, flags=WSLAY_MSG_MORE
send size=30, flags=0
send size=6, flags=WSLAY_MSG_MORE
send size=30, flags=0

What I would consider to be correct behavior is to receive these 3 callbacks instead:

send size=36, flags=WSLAY_MSG_MORE
send size=36, flags=WSLAY_MSG_MORE
send size=36, flags=0

End even better, it should have provided an option to "cork" the pipe, or configure buffer size so that it may as well produce this single callback call with all 3 messages in a single buffer:

send size=108, flags=0

This lib seems to be fairly popular, no idea how it can be used the way it'd implemented now.

pps83rbx avatar Sep 25 '19 02:09 pps83rbx

Current wireshark trace with wslay (I use wslay as a client, I use nodejs as a server on port=1122): image

If I manually handle WSLAY_MSG_MORE: image

And that's the way it should really be implemented to send all messages in a single TCP send:

image

The reason I discovered the problem was because in stress tests fairly optimized c++ code that used wslay as webscoket backend library was no match to vanilla nodejs websocket server that I wrote in 10 minutes: cpu of my app was always higher than cpu load taken by nodejs server that did way more than simply sending messages

pps83rbx avatar Sep 25 '19 02:09 pps83rbx

I can confirm this performance issue. A possible fix would be to merge the SEND_HEADER with the SEND_PAYLOAD state.

image

marcel303 avatar Sep 20 '20 17:09 marcel303

Any comments @tatsuhiro-t ?

pps83 avatar Sep 21 '20 20:09 pps83

Merging SEND_HEADER with SEND_PAYLOAD seems like it would be a good idea. One way to do that would be to change wslay_event_omsg_non_fragmented_init() so that the payload data is allocated with (*m)->data = (uint8_t*)malloc(msg_length + 14); - giving enough space for the header data - then copying the data at position 14 onwards. The header could then reside in the extra 14 bytes at the beginning of the buffer, allowing a single write.

ralight avatar Jan 10 '21 23:01 ralight

Finally, I made a progress on this. I added wslay_event_write which writes pending messages to a given buffer instead of calling send_callback. This avoids WSLAY_MSG_MORE and small packet write. Still it is not ideal because it needs extra data copy. It is slightly more efficient than an application buffers small write into its own buffer. That said, it has less moving parts, it would be rather simple to use. It has different semantics from wslay_event_send, so in order to adopt it, you need to make some changes to your application. I also made some optimizations by reducing malloc calls and tightens the code with extra compiler flags to modernize aging code base.

tatsuhiro-t avatar Jan 14 '21 14:01 tatsuhiro-t

The ideal API for me would be a form of wslay_event_send, where the application provides a heap allocated buffer and wslay takes ownership of that heap memory and frees it when it is sent. Additionally requiring the application to allocate an extra 14 bytes at the beginning of the buffer for the websockets header data, as I mention above, would mean there would be no need to allocate or copy the payload in wslay, and there would be no small writes either.

ralight avatar Jan 15 '21 12:01 ralight

For reference, wslay_event_write is added by Add wslay_event_write and wslay_frame_write

pps83 avatar May 01 '21 21:05 pps83

HI! I've made a little fork of wslay that uses @ralight idea. https://github.com/Softmotions/iwnet/blob/master/src/wslay/wslay_frame.c#L65-L200 The approach is a bit hacky but works well.

adamansky avatar Aug 03 '22 15:08 adamansky