tls TLS streams are not guaranteed to be "split" (full-duplex) safe

Currently every TLS stream can be split into a reading half and a writing half using tokio::util::split. This operation is however not always guaranteed to be correct and safe to perform:

The reason here is that with TLS reading and writing on a stream are not guaranteed to be decoupled, since TLS streams also transfer control data besides application data. Due to this, performing a read operation on the TLS stream might trigger a write operation on the socket (e.g. for sending a key update or alert), and the other way around.

This property can break assumptions that tokio/tokio-tls and the libraries make at the moment. As one example:

The user performs a read, which triggers an alert to be sent to the peer
The TLS library performs a write on the underlying socket, which report a blocked status. It forwards that blocked status to the application.
The task yields and waits to get woken up based on read readiness. Which might not happen, since the task which last registered for write readiness might be woken up. So the reading half would be stuck.
The last point happens if the read task doesn't install its Waker for write readiness. If it instead identifies being write blocked, and overwrites the write Waker the situation is not necessarily better. Now a concurrent write operation might instead be starved, since the Waker is gone.

What will exactly happen or won't happen is a bit of a property of the TLS library, and might therefore be handled completely different by rustls, openssl, schannel and security-framework. Therefore it's rather hard to describe what exactly "could go wrong" if someone tries to split a TLS stream.

With rustls reading and writing onto actual sockets is currently purely handled by tokio-tls, therefore its the most easy thing to grasp. Here any read or write operations in the wrapper simply don't seem to care whether rustls wants to perform IO in the opposite direction. That might not lead to Waker stealing and tasks getting starved. However I guess it could lead to a delay in sending certain updates. @ctz might know more whether this is problematic.

With native-tls + openssl it is very likely that it will perform IO in the opposite direction and handle readiness notifications wrong. https://dzone.com/articles/using-openssl-with-libuv provides a bit of information what needs to be done to handle those cases, but I think native-tls doesn't since it purely forwards all TLS calls to openssl which uses the fd to perform IO. From there on things rely on mio to report readiness, which only cares about the sockets read/write state and not the TLS streams read/write state.

How can this be fixed

It's unfortunately not easy. To avoid people running into starved streams, one solution is to get rid of tokio::util::split and hightlight in docs that streams support only one common Waker for both read and write operations. But that's not making streams full duplex.

I think to really enable full-duplex the following things could be done:

Make sure the TLS libraries don't directly perform IO, they just get fed incoming data, and buffer outgoing data (or write it to a buffered writer). If that buffer is full they exercise backpressure on the caller. That is kind of what rustls is already doing.
Besides the applications read and write task, there is a TLS IO task which makes sure that if there is any buffered outgoing data (either produced via a read or write operation) that data gets written to a socket. Only that task deals with OS readiness notifications. If sufficient data had been flushed, it wakes up the potentially blocked application write task. If new data from the socket had been received and decoded via the TLS library, it wakes up the read task.

This is kind of also the model that other multiplexing solutions - like HTTP/2 - use in their implementations. The downside is obviously the need for spawning as task, and potential overhead of task-switching which can degrade performance - especially if a multithreaded runtime is used, which would require synchronization between the application and TLS IO task. It also makes the solution less runtime agnostic, and harder to force-cancel ongoing IO.

Nov 27 '20 23:11 Matthias247

I had another thought on how to work around this in a generic fashion:

The split method could generate a new Waker type, similar to how FuturesUnordered works. I will call it VirtualWaker.
On all calls to TlsStream::read/write, the application tasks Waker would get stored inside the generated VirtualWaker, and the VirtualWaker would get forwarded to the call to the TLS library.
VirtualWaker::wake is implemented by waking up the applications read and write task, which can be done by wakeing all Wakers handles which are stored inside the generated VirtualWaker.

That would lead to spurious wakeups on the opposite direction and 50% wasted syscalls, but the same also happens if someone performs concurrent reads/writes on a TlsStream using select!/join!/etc.

Nov 27 '20 23:11 Matthias247

I believe that tokio-rustls will hardly have such problems. The only place it performs IO in the opposite direction is when it tries to send an alert when an error occurs.

but the old version of tokio-rustls and some fork based on it do have this problem. like async-tls

Nov 28 '20 04:11 quininer

I agree this should be safe, since this is a terminal condition anyway.

The downside of this model is that if rustls itself would require e.g. sending re-negotations or other control data on reads tokio-tls would currently not perform this. But as far as I understand those things are not supported by rustls.

It might be more tricky to find out which ones of the native-tls versions would be safe to split, since they could all behave different.

I also looked at async-tls which you linked. That one indeed seems generally not safe to split, since it always tries to perform opposite direction IO after any interaction with it. But I'm also not sure if the async-std ecosystem has an equivalent generic split method like tokio::io::split which would trigger the issue.

Nov 29 '20 02:11 Matthias247

It sounds to me that the TLS streams have buggy implementation of the traits.

Another way to fix this would be for the TlsStream to hold its own read/write wakers and when calling the inner stream, it passes a weaker that fans out notification to both the read / write half. This will result in spurious wakeups but will fix the implmentations.

Nov 30 '20 18:11 carllerche

cc @sfackler

Nov 30 '20 19:11 carllerche

This sounds dangerous and can render split unusable...Any progress on this? I happened to want to use split

Dec 20 '21 14:12 rapiz1

Hi, would it be a good idea the library provide callback functions for TLS control data? when TLS readhalf receive tls control data, it dispatch to callback instead of application?

Dec 13 '22 22:12 gftea

dose still has this problem now?

Jun 09 '23 05:06 stanal

tls tls copied to clipboard

TLS streams are not guaranteed to be "split" (full-duplex) safe

How can this be fixed

tls
tls copied to clipboard