tungstenite-rs icon indicating copy to clipboard operation
tungstenite-rs copied to clipboard

allow sending messages with static or shared data

Open icewind1991 opened this issue 3 years ago • 15 comments

This allows the server to skip allocating buffers holding the message data to be send if the data is static.

  • This doesn't really save anything for clients atm due to the need of masking, although changing the masking logic to run on the output buffer instead of the message buffer could solve that.
  • The change in the Message fields data make this a breaking change for any code directly creating or deconstructing the Message enum.
  • Keeping the text and binary methods intact requires adding separate static_* methods, an alternative might be to change text/binary to take an Into<Cow<_>> directly, this would break any code calling those methods with anything that doesn't implement Into<Cow<_>> but should be fine for the most common parameters of String/&str/Vec<u8>/&[u8]

icewind1991 avatar Feb 09 '21 19:02 icewind1991

an alternative might be to change text/binary to take an Into<Cow<_>> directly

Another approach would be to implement https://github.com/snapview/tungstenite-rs/issues/96 similar to the way it was tried to be implemented here: https://github.com/snapview/tungstenite-rs/pull/104 , but to use Bytes as a least common denominator for the type of the data that the message holds. Of course it's not the same as holding 'static data, yet more efficient than storing Vec<u8> or String.

daniel-abramov avatar Feb 12 '21 16:02 daniel-abramov

I've changed things around a bit, switching to custom enums for the binary and string data to allow dealing with shared data trough Bytes and changed the Message::text/Mesage::binary methods to handle both the unique(Vec<u8>/String), shared(Bytes) and static(&'static [u8]/&'static str) cases transparently. I've also made the datatype that stores the message data opaque so things can be changed in the future without any api breaks.

I've currently not added support from shared string data, as there is no way to go from Arc<String> to Bytes without copying, it is still possible to add this later by making the Frame deal with the shared text case seperate.

icewind1991 avatar Feb 13 '21 17:02 icewind1991

I'd eventually like to review this, but I'll probably need a couple more days before I manage so. @jxs, do you want too?

strohel avatar Feb 15 '21 17:02 strohel

I'm trying to understand whether this would make the client side any more inefficient.

I have some mostly finished code that removes the need to make the data a unique copy when applying the mask that I was originally going to put into a follow up pr but I can add it here is desired,

icewind1991 avatar Feb 16 '21 18:02 icewind1991

I have some mostly finished code that removes the need to make the data a unique copy when applying the mask that I was originally going to put into a follow up pr but I can add it here is desired

That would be interesting to check (probably in a separate commit?).

daniel-abramov avatar Apr 16 '21 17:04 daniel-abramov

pushed

icewind1991 avatar Apr 23 '21 19:04 icewind1991

When will this be fully merged? I think it's a very nice feature and doesn't make stuff too complex.

Selyatin avatar Jun 29 '22 20:06 Selyatin

I need to send data that allocated by custom allocator.

Can it be trait instead of enum MessageData, with method to provide &[u8] and Vec<u8>?

Dushistov avatar Jul 20 '22 19:07 Dushistov

When will this be fully merged? I think it's a very nice feature and doesn't make stuff too complex.

It has some unresolved merge conflicts. It seems like replacing the binary bytes with Bytes would be the simplest solution and will take less code, so unless it has some unfortunate consequences for the performance, I think it would be a way to go.

Can it be trait instead of enum MessageData, with method to provide &[u8] and Vec?

Do you mean Message becoming Message<T: AsRef>? In theory, it's of course possible (not sure if it would be particularly ergonomic though). If we used Bytes instead of a Vec<u8>, would it solve your problem?

daniel-abramov avatar Jul 20 '22 19:07 daniel-abramov

If we used Bytes instead of a Vec, would it solve your problem?

I don't see how Bytes can help without unsafe.

I have data allocated with libc::malloc and want to send them via web socket.

Yes, I can transmutate them into &'static [u8] and then call Bytes::from_static. But this is if I really desperate for performance. So Bytes and Vec is not what I need. &[u8] would be nice interface (but tungstenite-rs for some reason may want to modify data?), so trait looks the most promising.

Do you mean Message becoming Message<T: AsRef>

Yes, something like this, it can be type Message = GenericMessage<Vec<u8>>, so the end user don't see the difference.

Dushistov avatar Jul 20 '22 20:07 Dushistov

I don't see how Bytes can help without unsafe. I have data allocated with libc::malloc and want to send them via web socket.

What I mean is that imagine that you're allocating memory with libc::malloc, but store it internally as BytesMut or something like this (might indeed require unsafe), then it would be easy to take Bytes and pass it to tungstenite without copying the data one more time.

&[u8] would be nice interface (but tungstenite-rs for some reason may want to modify data?), so trait looks the most promising.

&[u8] is not a problem, but the question is how are we going to store it (taking into account that the library could be used with Tokio runtime), i.e. when users read messages, the returned Message must own the data. When users write messages, the Message must own something that allows getting a reference to the underlying data in a thread-safe way (so that the Message could be passed between threads).

The only universal solution that I have in mind is to make Message generic over T where T would implement AsRef or something like this. This means that the connect and similar user-friendly methods would use Vec<u8> for T, whereas when the users create sockets by themselves, they can specify any type they want (in your case it would mean wrapping the data into some sort of an Arc though if you're using it with Tokio for instance).

daniel-abramov avatar Jul 20 '22 20:07 daniel-abramov

@Dushistov, so for your particular question the workaround would be to create Vec<u8> from the data that you allocated with libc and then pass it to the library.

The only problem with this approach is that you'll have to copy the data which is not particularly efficient, but if we make the type of Message more generic (as I described in the comment above), that alone won't solve your problem, because whenever we receive a frame from the user, we write it to the send queue (that eventually gets flushed), e.g. imaging calling write_message() 4 times in a row with 4 different messages. If they all end up being in a queue, tungstenite will have to own the data inside to be able to send them.

In the case of this PR, it's mitigated by making the Frame hold Bytes (e.g. a strong shared reference to the original data). In other words: you'll have to place your data inside Bytes anyway (or inside another Arc based structure, but that would effectively be a reinvention of a Bytes 🙂).

daniel-abramov avatar Jul 20 '22 20:07 daniel-abramov

so for your particular question the workaround would be to create Vec

I am already doing so. But this looks like waste, C++ allocate buffer, copy data to it transfer ownership to Rust code, Rust code then again allocate memory via Vec, copy to it, and then give it to tungstenite-rs.

If they all end up being in a queue, tungstenite will have to own the data inside to be able to send them.

if tungstenite only want to own data then this is not problem, wraper around allocated data has proper Drop implementation that call libc::free, so it is actually like Vec<u8> with custom allocator and without capacity field.

The only universal solution that I have in mind is to make Message generic

generic Message simplify thing, it would be possible to have two Message, one for sending, and one for receiving.

Dushistov avatar Jul 20 '22 22:07 Dushistov

if tungstenite only want to own data then this is not problem, wraper around allocated data has proper Drop implementation that call libc::free, so it is actually like Vec with custom allocator and without capacity field.

Ok, I think I finally got your point :) So basically you don't really need sending the shared data (like this PR mentions), you're more concerned about the fact that currently you can't pass a custom T: AsRef<[u8]> structure to the Message, right?

generic Message simplify thing, it would be possible to have two Message, one for sending, and one for receiving.

Hm, not quite. I mean, of course we could define different data types for incoming and outgoing message (strictly speaking there is nothing wrong with it), but we would need to update quite a few things to make it ergonomic with tokio-tungstenite then (e.g. won't be able to use StreamExt::split() and similar things), but there are some solutions for this available though.

daniel-abramov avatar Jul 24 '22 16:07 daniel-abramov