tungstenite-rs
tungstenite-rs copied to clipboard
Allow sending of shared data
I think that it would make sense that Message
be instead:
pub enum Message {
Text(Arc<String>),
Binary(Arc<Vec<u8>>), // maybe taking `Bytes` from the `bytes` crate would be cleaner.
Ping(Vec<u8>),
Pong(Vec<u8>),
Close(Option<CloseFrame<'static>>),
}
This is because the websocket transport cannot guarantee messages to not be lost (from what I understand). Thus someone who wants to prevent messages from being lost would keep a list of messages and only consider them "sent" when they are ack'd by the peer. Using a Arc
would greatly reduce the cost of doing so since clone operations would be more efficient.
Hmm,
given your other issue, I understand your concern, but I think your usecase is rather rare. Thus putting everybodies message data always in an Arc will incur an overhead all the time for everybody that doesn't need this.
I would suggest that a connection loss is an exceptional and erroneous situation, in which case it's generally acceptable to have a performance cost. What I'm suggesting is that you regenerate the messages that haven't been acknowledged in this situation. This can happen concurrently to establishing a new connection.
That being said, maybe using a data type like Bytes
instead of Vec<u8>
would open up options for copyless scenarios, including this one? Not sure about the performance trade-offs with Bytes
.
Thus putting everybodies message data always in an Arc will incur an overhead all the time for everybody that doesn't need this.
I would suggest that a connection loss is an exceptional and erroneous situation, in which case it's generally acceptable to have a performance cost. What I'm suggesting is that you regenerate the messages that haven't been acknowledged in this situation.
In my experience, a TCP crash is only a matter of time, especially for long running connections (16bit checksum, etc.). Thus if you want reliable messaging you need to keep a copy of each message locally until it is ack'd. Currently, you need to clone a Vec<u8>
each time to do so, which is a massive additional overhead just to have reliability. So while it is true that the crash itself is not part of the hot path, the mechanism to ensure reliability very much is.
That being said, maybe using a data type like Bytes instead of Vec
would open up options for copyless scenarios, including this one? Not sure about the performance trade-offs with Bytes.
I know that this is the approach that tokio takes
(look at Bytes and BytesMut). The Bytes
is an Arc<u8>
plus some other convenience stuff. It would be interesting to benchmark to determine if this has a significant performance impact. My hypothesis is that the cost should be minimal compared to the I/O overhead.
Currently, you need to clone a
Vec<u8>
each time to do so, which is a massive additional overhead just to have reliability.
Sure, if you can't regenerate them just in case of failure, that's not nice for memory consumption.
What I hope by suggesting using Bytes is that we might get ecosystem wide possibility of copyless end-to-end. From where the data enters the process to where it leaves. There is an interesting new proposal for the AsyncRead/AsyncWrite traits in a PR on tokio that could really make that happen.
There is an interesting new proposal for the AsyncRead/AsyncWrite traits in a PR on tokio that could really make that happen.
That's cool, I'll take a look.
@najamelan What are your thoughts on having a asymettric Message
type? Something like:
pub enum SendMsg {
Text(String),
ShText(Arc<String>),
Binary(Vec<u8>),
ShBinary(Bytes),
Ping(Vec<u8>),
Pong(Vec<u8>),
Close(Option<CloseFrame<'static>>),
}
pub enum RecvMsg {
Text(String),
Binary(Vec<u8>),
Ping(Vec<u8>),
Pong(Vec<u8>),
Close(Option<CloseFrame<'static>>),
}
This would enable zero-copy scenarios while not forcing the user to use Bytes
.
An alternative would be to provide a separate WebSocket::write_shared_message
API. This would preserve backward compat.
enum SharedMessage {
Binary(Bytes),
}
Can't we instead change Message
to:
pub enum Message<'a> {
Text(&'a str),
Binary(&'a [u8]),
Ping(&'a [u8]),
Pong(&'a [u8]),
Close(Option<CloseFrame<'static>>),
}
This would allow to cache messages if needed without cloning data and will allow users to remove unnecessary allocations, e.g. if messages are fixed or with if reusable buffer is used.
The same type can be used for receiving messages as well, with the following signature changes:
// this function will use an inner buffer stored inside `WebSocket` instance
pub fn read_message<'a>(&'a mut self) -> Result<Message<'a>> { .. }
// we also could add function like this
pub fn read_message_to<'a>(&mut self, buf: &'a Vec<u8>) -> Result<Message<'a>> { .. }
This would allow additional optimizations, although it may complicate code a bit (especially for async).
Not for receiving. The incoming message may be split across multiple packets. It may work with Cow
but would still require copying in many cases.
But it would work with read_message_to
, correct? I think introducing an additional "reference" message type (in addition to the current "owned" one) and additional read/write methods for it is a better option compared to introducing additional message variants.
In the current implementation it would be not more efficient than just copying the vector afterwards. We keep the half-received message in a Vec
and return it by move as it gets complete.
I also proposed some idea in a comment to https://github.com/snapview/tungstenite-rs/pull/104, perhaps it would also be a solution.
I'll give #104 another try in a couple of weeks. I'm thinking about writing a decent benchmark to view the overhead of using Bytes
over a Vec<u8>
.
-
If the overhead is significant I'm thinking of using @application-developer-DA's approach and
[modify the] Message structure, so that the Message::Binary accepts some Payload type (and we can provideFrom implementation to support Bytes and Vec
) [...] -
If the overhead is minimal then maybe replacing
Vec<u8>
directly byBytes
makes sense.
Note that both cases would lead to a breaking change.
Is there any updates?
I think replace Vec<u8> by Bytes is a better choice, even if there is a breaking change.
@x1957 , awaiting a feedback from @agalakhov on https://github.com/snapview/tungstenite-rs/pull/104#pullrequestreview-345219851
Hello, i am new to this and very interested. Maybe it would be possible to add a Shared
variant to the enum which just stores an Arc<Message>
this would have the downside that a shared message could be "reshared" but it wouldn't be too hard to implement and then the Arc could handled internally. This also wouldn't incur overhead for everyone but only the users of the shared variant. This crates API is very handy, in my use case i want to create broadcast messages using websockets and cloning all the data is noticeable overhead.
There is #104 which is still open.
Perhaps the best and simplest solution would be the one that has already been suggested by contributors in this issue, e.g. usage of Bytes
instead of Vec<u8>
.
Hello. Broadcasting data to multiple connections like (10k connections) would be very inefficient since we need to copy message 10k times. I vote for solution with Bytes