tungstenite-rs icon indicating copy to clipboard operation
tungstenite-rs copied to clipboard

Allow sending of shared data

Open jean-airoldie opened this issue 5 years ago • 33 comments

I think that it would make sense that Message be instead:

pub enum Message {
    Text(Arc<String>),
    Binary(Arc<Vec<u8>>), // maybe taking `Bytes` from the `bytes` crate would be cleaner.
    Ping(Vec<u8>),
    Pong(Vec<u8>),
    Close(Option<CloseFrame<'static>>),
}

This is because the websocket transport cannot guarantee messages to not be lost (from what I understand). Thus someone who wants to prevent messages from being lost would keep a list of messages and only consider them "sent" when they are ack'd by the peer. Using a Arc would greatly reduce the cost of doing so since clone operations would be more efficient.

jean-airoldie avatar Nov 26 '19 15:11 jean-airoldie

Hmm,

given your other issue, I understand your concern, but I think your usecase is rather rare. Thus putting everybodies message data always in an Arc will incur an overhead all the time for everybody that doesn't need this.

I would suggest that a connection loss is an exceptional and erroneous situation, in which case it's generally acceptable to have a performance cost. What I'm suggesting is that you regenerate the messages that haven't been acknowledged in this situation. This can happen concurrently to establishing a new connection.

That being said, maybe using a data type like Bytes instead of Vec<u8> would open up options for copyless scenarios, including this one? Not sure about the performance trade-offs with Bytes.

najamelan avatar Nov 28 '19 13:11 najamelan

Thus putting everybodies message data always in an Arc will incur an overhead all the time for everybody that doesn't need this.

I would suggest that a connection loss is an exceptional and erroneous situation, in which case it's generally acceptable to have a performance cost. What I'm suggesting is that you regenerate the messages that haven't been acknowledged in this situation.

In my experience, a TCP crash is only a matter of time, especially for long running connections (16bit checksum, etc.). Thus if you want reliable messaging you need to keep a copy of each message locally until it is ack'd. Currently, you need to clone a Vec<u8> each time to do so, which is a massive additional overhead just to have reliability. So while it is true that the crash itself is not part of the hot path, the mechanism to ensure reliability very much is.

That being said, maybe using a data type like Bytes instead of Vec would open up options for copyless scenarios, including this one? Not sure about the performance trade-offs with Bytes.

I know that this is the approach that tokio takes(look at Bytes and BytesMut). The Bytes is an Arc<u8> plus some other convenience stuff. It would be interesting to benchmark to determine if this has a significant performance impact. My hypothesis is that the cost should be minimal compared to the I/O overhead.

jean-airoldie avatar Nov 28 '19 14:11 jean-airoldie

Currently, you need to clone a Vec<u8> each time to do so, which is a massive additional overhead just to have reliability.

Sure, if you can't regenerate them just in case of failure, that's not nice for memory consumption.

What I hope by suggesting using Bytes is that we might get ecosystem wide possibility of copyless end-to-end. From where the data enters the process to where it leaves. There is an interesting new proposal for the AsyncRead/AsyncWrite traits in a PR on tokio that could really make that happen.

najamelan avatar Nov 29 '19 15:11 najamelan

There is an interesting new proposal for the AsyncRead/AsyncWrite traits in a PR on tokio that could really make that happen.

That's cool, I'll take a look.

jean-airoldie avatar Nov 29 '19 15:11 jean-airoldie

@najamelan What are your thoughts on having a asymettric Message type? Something like:

pub enum SendMsg {
    Text(String),
    ShText(Arc<String>),
    Binary(Vec<u8>),
    ShBinary(Bytes),
    Ping(Vec<u8>),
    Pong(Vec<u8>),
    Close(Option<CloseFrame<'static>>),
}

pub enum RecvMsg {
    Text(String),
    Binary(Vec<u8>),
    Ping(Vec<u8>),
    Pong(Vec<u8>),
    Close(Option<CloseFrame<'static>>),
}

This would enable zero-copy scenarios while not forcing the user to use Bytes.

jean-airoldie avatar Jan 14 '20 19:01 jean-airoldie

An alternative would be to provide a separate WebSocket::write_shared_message API. This would preserve backward compat.

enum SharedMessage {
    Binary(Bytes),
}

jean-airoldie avatar Jan 14 '20 21:01 jean-airoldie

Can't we instead change Message to:

pub enum Message<'a> {
    Text(&'a str),
    Binary(&'a [u8]),
    Ping(&'a [u8]),
    Pong(&'a [u8]),
    Close(Option<CloseFrame<'static>>),
}

This would allow to cache messages if needed without cloning data and will allow users to remove unnecessary allocations, e.g. if messages are fixed or with if reusable buffer is used.

The same type can be used for receiving messages as well, with the following signature changes:

// this function will use an inner buffer stored inside `WebSocket` instance
pub fn read_message<'a>(&'a mut self) -> Result<Message<'a>> { .. }
// we also could add function like this
pub fn read_message_to<'a>(&mut self, buf: &'a Vec<u8>) -> Result<Message<'a>> { .. }

This would allow additional optimizations, although it may complicate code a bit (especially for async).

newpavlov avatar Mar 05 '20 19:03 newpavlov

Not for receiving. The incoming message may be split across multiple packets. It may work with Cow but would still require copying in many cases.

agalakhov avatar Mar 05 '20 23:03 agalakhov

But it would work with read_message_to, correct? I think introducing an additional "reference" message type (in addition to the current "owned" one) and additional read/write methods for it is a better option compared to introducing additional message variants.

newpavlov avatar Mar 06 '20 17:03 newpavlov

In the current implementation it would be not more efficient than just copying the vector afterwards. We keep the half-received message in a Vec and return it by move as it gets complete.

agalakhov avatar Mar 06 '20 21:03 agalakhov

I also proposed some idea in a comment to https://github.com/snapview/tungstenite-rs/pull/104, perhaps it would also be a solution.

daniel-abramov avatar Mar 16 '20 18:03 daniel-abramov

I'll give #104 another try in a couple of weeks. I'm thinking about writing a decent benchmark to view the overhead of using Bytes over a Vec<u8>.

  • If the overhead is significant I'm thinking of using @application-developer-DA's approach and

    [modify the] Message structure, so that the Message::Binary accepts some Payload type (and we can provideFrom implementation to support Bytes and Vec) [...]

  • If the overhead is minimal then maybe replacing Vec<u8> directly by Bytes makes sense.

Note that both cases would lead to a breaking change.

jean-airoldie avatar Mar 17 '20 11:03 jean-airoldie

Is there any updates?

I think replace Vec<u8> by Bytes is a better choice, even if there is a breaking change.

x1957 avatar Jun 29 '20 06:06 x1957

@x1957 , awaiting a feedback from @agalakhov on https://github.com/snapview/tungstenite-rs/pull/104#pullrequestreview-345219851

daniel-abramov avatar Jul 22 '20 18:07 daniel-abramov

Hello, i am new to this and very interested. Maybe it would be possible to add a Shared variant to the enum which just stores an Arc<Message> this would have the downside that a shared message could be "reshared" but it wouldn't be too hard to implement and then the Arc could handled internally. This also wouldn't incur overhead for everyone but only the users of the shared variant. This crates API is very handy, in my use case i want to create broadcast messages using websockets and cloning all the data is noticeable overhead.

DrSloth avatar Dec 12 '21 01:12 DrSloth

There is #104 which is still open.

Perhaps the best and simplest solution would be the one that has already been suggested by contributors in this issue, e.g. usage of Bytes instead of Vec<u8>.

daniel-abramov avatar Jan 04 '22 18:01 daniel-abramov

Hello. Broadcasting data to multiple connections like (10k connections) would be very inefficient since we need to copy message 10k times. I vote for solution with Bytes

amykhaylyshyn avatar Feb 07 '22 14:02 amykhaylyshyn