s2n-tls s2n write APIs do not return the number of actually copied bytes, and thereby allow for Stream corruption

Problem:

The API docs for s2_send and vectored send functions mention

NOTE: Unlike OpenSSL, repeated calls to s2n_send should not duplicate the original parameters, but should update buf and size per the indication of size written. For example;

This provides the impression that s2n send APIs act like every other posix like write API: It returns the amount of bytes that had actually processed. With these APIs, the next write can provide any other data.

For s2n this is however not the case:

Calling s2n_send will copy a certain amount of bytes inside its internal buffer (stuffer). It will however not return the amount of copied bytes. Instead it will try to flush the buffer onto the underlying writer, and return the number of bytes that have been flushed to the wire here.

The amount of non-flushed bytes is tracked in current_user_data_consumed.

In the next s2n_send attempt, s2n will ignore the first current_user_data_consumed bytes the caller passes (1 + 2). It assumes those bytes are the same as the last time when s2n_send are called, since it told the application that those bytes had not been written.

However this fact is neither documented, nor visible to the application.

The behavior will not make a difference if the application passes exactly the same buffer again in the next s2n_sendv call, or if it only attaches new buffers at the end of the IO vector.

However this behavior will cause issues if the application in the next s2n_send call passes a different buffer: In this case s2n will ignore the start of the new buffer and never send it, and instead sends the end of the last buffer, about which it never informed the application that it consumed it.

Here is an example using vectored writes:

Application calls s2n_sendv([buffer1[0..200], buffer2[0..50], buffer3[0..100]])
- s2n copies all 350 bytes of data into its stuffer
- s2n tries to flush the bytes. It was able to send 220 bytes, before EAGAIN was returned. This means 130 bytes (of buffer2 and buffer3) stay buffered
- s2n returns 220 to caller
Application calls s2n_send([buffer2[20..50], buffer4[0..200]]). The application here wants its buffers to be atomically written - therefore it resubmits buffer2 which according to its understanding had only been partially written. Since buffer3 was never touched according to its understanding, it replaces it with buffer4 which now has a higher priority.
s2n ignores the first 130 bytes of this second write call, since it assumes those are the same as the not yet confirmed 130 bytes.
s2n copies the remaining 100 bytes (which are the end of buffer4) into a new record
s2n flushes all data. Since the socket this time does not block it could write all data. It will return 220.

As an end result, the actually written bytes are:

buffer1[0..200]
buffer2[0..50]
buffer3[0..100]
buffer4[0..100]

However the application assumes they had been:

buffer1[0..200]
buffer2[0..50]
buffer4[0..200]

==> The data stream got corrupted, since half of buffer3 and buffer4 got written, without the applications awareness.

This kind of behavior will be encountered if the application level protocol is message based rather than a pure stream, and if the application tries to dynamically reorder messages for each write call in order to guarantee prioritization and efficiency. Examples of such protocols are HTTP/2, MQTT or RPC protocols like Thrift.

Proposed Solution:

Documentation

This behavior is apparently the same as OpenSSLs behavior - therefore people familiar with OpenSSL might expect this behavior and be prepared to handle it.

OpenSSL docs explicitly mention:

WARNINGS

When a write function call has to be repeated because SSL_get_error(3) returned SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE, it must be repeated with the same arguments. The data that was passed might have been partially processed.

However for s2n it is not explicitly documented.

The minimum thing that should be done for s2n is very explicitly documenting the behavior. A note should mention that any follow-up call to a s2n_send API must repeat data it previously tried to write.

Better APIs

While improving docs is a step forward, I actually think more misuse resistant APIs - which are a core goal of s2n - might be a better alternative. The s2n connection represents a bidirectional byte stream. It should thereby act API wise as other Stream APIs do:

Writes on a Stream return the amount of transferred bytes from the callers buffer into the underlying Stream.
The Stream might be buffered - which means it's not necessarily guaranteed that this Stream has flushed all its data. Flushing can instead happen during a follow-up write(), or during an explicit flush() call.

That behavior matches other buffered write APIs. As discussed however, this kind of API also introduces some new challenges:

If we change the contract of the write API to only return the amount of copied bytes, then it could not return the fact anymore that an internal flush did not run to completion due to the underlying stream returning EAGAIN. To some bytes stay buffered in S2N, without the application knowing about it. And the socket status might have changed to non-writable, without the application being aware of. The application would only determine it at the next write call, or during shutdown. This seems OK fo normal writes. However e.g. for keep-alive connections applications also have to flush remaining data to the socket without wanting to close the socket. The flush API is such an API
The flush API of s2n is however so far not exported
Even if flush would be exported, applications would need to be changed to make use of it and to deal with the fact that the write return value does not reflect socket readiness anymore. We might need to communicate socket readiness - and potentially the amount of buffered bytes inside S2N - in a side channel.

Apr 27 '20 19:04 Matthias247

"We might need to communicate socket readiness - and potentially the amount of buffered bytes inside S2N - in a side channel."

Why not just add an extra uint32_t *num_buffered_bytes_out parameter?

Apr 27 '20 20:04 danielsn

If the fix is an API change, would this need to be an "s2n_sendv2"? or some other new name, to maintain back compat

Feb 01 '23 23:02 colmmacc

If the fix is an API change, would this need to be an "s2n_sendv2"? or some other new name, to maintain back compat

if the current API is actually not meeting the expected behavior by customers , i would be more inclined to fix it. However, if the fix might break customers' integration with the library then we don't have other choice than delivering a new API and flag the old one as deprecated.

Feb 02 '23 02:02 zaherd

Even if flush would be exported, applications would need to be changed to make use of it and to deal with the fact that the write return value does not reflect socket readiness anymore. We might need to communicate socket readiness - and potentially the amount of buffered bytes inside S2N - in a side channel.

This might be a requirement for TLS1.3 even without changing the behavior of s2n_send: https://github.com/aws/s2n-tls/issues/3504

Feb 02 '23 18:02 lrstewart

if the current API is actually not meeting the expected behavior by customers , i would be more inclined to fix it. However, if the fix might break customers' integration with the library then we don't have other choice than delivering a new API and flag the old one as deprecated.

We might be able to add a config setting to change the buffering behavior. That's how we fixed a similar "very old and now accepted use" bug on the receive side: https://github.com/aws/s2n-tls/commit/2d5ddb3ae0bcc9b01d5d71906a604e489a56d547 It's not ideal, but it's an option.

Feb 02 '23 18:02 lrstewart

s2n-tls s2n-tls copied to clipboard

s2n write APIs do not return the number of actually copied bytes, and thereby allow for Stream corruption

Problem:

Proposed Solution:

Documentation

Better APIs

s2n-tls
s2n-tls copied to clipboard