wslay
wslay copied to clipboard
WSLAY_MSG_MORE annoyance
First of all, I don't understand what was the point to send 6-byte header followed by the message body for each websocket message to be sent. This will result in multiple tcp messages (most users of websocket probably would have TCP_NODELAY enabled).
Not only this thing is annoying to handle, it's also 1) not clear how to handle it properly, and 2) it's doesn't really do what it's supposed to do.
Regarding 1)
Imagine that I want to buffer sends manually when flags & WSLAY_MSG_MORE
. In my case with simple text message I get two callbacks for 6 bytes and then for 30 bytes. When I get first 6 bytes, I copy them to my buffer and return from callback 6, when I get second chunk with flags=0 I append it to my buffer and try to send all 36 accumulated bytes in a single tcp send. If all bytes get send I return 30. This part is more or less obvious, but it's totally not clear what to do in case it my second buffer gets sent only partially (imagine that tcp send call sends only 29 or 31 bytes out of 36 requested; thus I'm left with 7 or 5 bytes in my own buffer). What should I return back to wslay in this case? I still return 30, and do wslay_event_set_error(ctx, WSLAY_ERR_WOULDBLOCK);
Is that correct?.. but then how will wslay unblock or make me send the remaining 7 or 5 bytes from my own buffer if wslay itself doesn't have anything else pending to be sent?.. will it call me with empty buffer to be sent so that I could flush my own buffer?.. because of this design remote won't receive complete message and will be waiting for the remaining part of the message, isn't it?.. In other words, there is no change a user of the wslay lib could handle that option correctly (unless it simply passes it on to tcp send call). In my case I use curl_easy_send
and I do not have an option to pass on WSLAY_MSG_MORE
flag.
Regarding 2)
The whole thing with WSLAY_MSG_MORE seems to be poorly implemented and signals only this weird unnecessary fragmentation that wslay does, but DOES NOT signal properly if there is more data to be sent. For example, if I queue a 3 websocket messages, I end up with this weird sequence of callbacks:
send size=6, flags=WSLAY_MSG_MORE
send size=30, flags=0
send size=6, flags=WSLAY_MSG_MORE
send size=30, flags=0
send size=6, flags=WSLAY_MSG_MORE
send size=30, flags=0
What I would consider to be correct behavior is to receive these 3 callbacks instead:
send size=36, flags=WSLAY_MSG_MORE
send size=36, flags=WSLAY_MSG_MORE
send size=36, flags=0
End even better, it should have provided an option to "cork" the pipe, or configure buffer size so that it may as well produce this single callback call with all 3 messages in a single buffer:
send size=108, flags=0
This lib seems to be fairly popular, no idea how it can be used the way it'd implemented now.
Current wireshark trace with wslay (I use wslay as a client, I use nodejs as a server on port=1122):
If I manually handle WSLAY_MSG_MORE:
And that's the way it should really be implemented to send all messages in a single TCP send:
The reason I discovered the problem was because in stress tests fairly optimized c++ code that used wslay as webscoket backend library was no match to vanilla nodejs websocket server that I wrote in 10 minutes: cpu of my app was always higher than cpu load taken by nodejs server that did way more than simply sending messages
I can confirm this performance issue. A possible fix would be to merge the SEND_HEADER with the SEND_PAYLOAD state.
Any comments @tatsuhiro-t ?
Merging SEND_HEADER with SEND_PAYLOAD seems like it would be a good idea. One way to do that would be to change wslay_event_omsg_non_fragmented_init()
so that the payload data is allocated with (*m)->data = (uint8_t*)malloc(msg_length + 14);
- giving enough space for the header data - then copying the data at position 14 onwards. The header could then reside in the extra 14 bytes at the beginning of the buffer, allowing a single write.
Finally, I made a progress on this. I added wslay_event_write
which writes pending messages to a given buffer instead of calling send_callback. This avoids WSLAY_MSG_MORE and small packet write.
Still it is not ideal because it needs extra data copy. It is slightly more efficient than an application buffers small write into its own buffer. That said, it has less moving parts, it would be rather simple to use. It has different semantics from wslay_event_send
, so in order to adopt it, you need to make some changes to your application. I also made some optimizations by reducing malloc calls and tightens the code with extra compiler flags to modernize aging code base.
The ideal API for me would be a form of wslay_event_send
, where the application provides a heap allocated buffer and wslay takes ownership of that heap memory and frees it when it is sent. Additionally requiring the application to allocate an extra 14 bytes at the beginning of the buffer for the websockets header data, as I mention above, would mean there would be no need to allocate or copy the payload in wslay, and there would be no small writes either.
For reference, wslay_event_write is added by Add wslay_event_write and wslay_frame_write
HI! I've made a little fork of wslay that uses @ralight idea. https://github.com/Softmotions/iwnet/blob/master/src/wslay/wslay_frame.c#L65-L200 The approach is a bit hacky but works well.