s2n-tls
s2n-tls copied to clipboard
s2n write APIs do not return the number of actually copied bytes, and thereby allow for Stream corruption
Problem:
The API docs for s2_send and vectored send functions mention
NOTE: Unlike OpenSSL, repeated calls to s2n_send should not duplicate the original parameters, but should update buf and size per the indication of size written. For example;
This provides the impression that s2n send APIs act like every other posix like write
API: It returns the amount of bytes that had actually processed. With these APIs, the next write
can provide any other data.
For s2n this is however not the case:
Calling s2n_send
will copy a certain amount of bytes inside its internal buffer (stuffer
). It will however not return the amount of copied bytes. Instead it will try to flush the buffer onto the underlying writer, and return the number of bytes that have been flushed to the wire here.
The amount of non-flushed bytes is tracked in current_user_data_consumed
.
In the next s2n_send
attempt, s2n will ignore the first current_user_data_consumed
bytes the caller passes (1 + 2). It assumes those bytes are the same as the last time when s2n_send
are called, since it told the application that those bytes had not been written.
However this fact is neither documented, nor visible to the application.
The behavior will not make a difference if the application passes exactly the same buffer again in the next s2n_sendv
call, or if it only attaches new buffers at the end of the IO vector.
However this behavior will cause issues if the application in the next s2n_send
call passes a different buffer: In this case s2n will ignore the start of the new buffer and never send it, and instead sends the end of the last buffer, about which it never informed the application that it consumed it.
Here is an example using vectored writes:
- Application calls s2n_sendv([buffer1[0..200], buffer2[0..50], buffer3[0..100]])
- s2n copies all 350 bytes of data into its
stuffer
- s2n tries to flush the bytes. It was able to send 220 bytes, before
EAGAIN
was returned. This means 130 bytes (of buffer2 and buffer3) stay buffered - s2n returns
220
to caller
- s2n copies all 350 bytes of data into its
- Application calls s2n_send([buffer2[20..50], buffer4[0..200]]). The application here wants its buffers to be atomically written - therefore it resubmits
buffer2
which according to its understanding had only been partially written. Sincebuffer3
was never touched according to its understanding, it replaces it withbuffer4
which now has a higher priority. - s2n ignores the first 130 bytes of this second write call, since it assumes those are the same as the not yet confirmed 130 bytes.
- s2n copies the remaining 100 bytes (which are the end of
buffer4
) into a new record - s2n flushes all data. Since the socket this time does not block it could write all data. It will return 220.
As an end result, the actually written bytes are:
- buffer1[0..200]
- buffer2[0..50]
- buffer3[0..100]
- buffer4[0..100]
However the application assumes they had been:
- buffer1[0..200]
- buffer2[0..50]
- buffer4[0..200]
==> The data stream got corrupted, since half of buffer3
and buffer4
got written, without the applications awareness.
This kind of behavior will be encountered if the application level protocol is message based rather than a pure stream, and if the application tries to dynamically reorder messages for each write call in order to guarantee prioritization and efficiency. Examples of such protocols are HTTP/2, MQTT or RPC protocols like Thrift.
Proposed Solution:
Documentation
This behavior is apparently the same as OpenSSLs behavior - therefore people familiar with OpenSSL might expect this behavior and be prepared to handle it.
OpenSSL docs explicitly mention:
WARNINGS
When a write function call has to be repeated because SSL_get_error(3) returned SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE, it must be repeated with the same arguments. The data that was passed might have been partially processed.
However for s2n it is not explicitly documented.
The minimum thing that should be done for s2n is very explicitly documenting the behavior. A note should mention that any follow-up call to a s2n_send
API must repeat data it previously tried to write.
Better APIs
While improving docs is a step forward, I actually think more misuse resistant APIs - which are a core goal of s2n - might be a better alternative. The s2n connection represents a bidirectional byte stream. It should thereby act API wise as other Stream APIs do:
- Writes on a Stream return the amount of transferred bytes from the callers buffer into the underlying Stream.
- The Stream might be buffered - which means it's not necessarily guaranteed that this Stream has flushed all its data. Flushing can instead happen during a follow-up
write()
, or during an explicitflush()
call.
That behavior matches other buffered write APIs. As discussed however, this kind of API also introduces some new challenges:
- If we change the contract of the
write
API to only return the amount of copied bytes, then it could not return the fact anymore that an internal flush did not run to completion due to the underlying stream returningEAGAIN
. To some bytes stay buffered in S2N, without the application knowing about it. And the socket status might have changed to non-writable, without the application being aware of. The application would only determine it at the next write call, or duringshutdown
. This seems OK fo normal writes. However e.g. for keep-alive connections applications also have to flush remaining data to the socket without wanting to close the socket. Theflush
API is such an API - The
flush
API of s2n is however so far not exported - Even if
flush
would be exported, applications would need to be changed to make use of it and to deal with the fact that thewrite
return value does not reflect socket readiness anymore. We might need to communicate socket readiness - and potentially the amount of buffered bytes inside S2N - in a side channel.
"We might need to communicate socket readiness - and potentially the amount of buffered bytes inside S2N - in a side channel."
Why not just add an extra uint32_t *num_buffered_bytes_out
parameter?
If the fix is an API change, would this need to be an "s2n_sendv2"? or some other new name, to maintain back compat
If the fix is an API change, would this need to be an "s2n_sendv2"? or some other new name, to maintain back compat
if the current API is actually not meeting the expected behavior by customers , i would be more inclined to fix it. However, if the fix might break customers' integration with the library then we don't have other choice than delivering a new API and flag the old one as deprecated.
Even if flush would be exported, applications would need to be changed to make use of it and to deal with the fact that the write return value does not reflect socket readiness anymore. We might need to communicate socket readiness - and potentially the amount of buffered bytes inside S2N - in a side channel.
This might be a requirement for TLS1.3 even without changing the behavior of s2n_send: https://github.com/aws/s2n-tls/issues/3504
if the current API is actually not meeting the expected behavior by customers , i would be more inclined to fix it. However, if the fix might break customers' integration with the library then we don't have other choice than delivering a new API and flag the old one as deprecated.
We might be able to add a config setting to change the buffering behavior. That's how we fixed a similar "very old and now accepted use" bug on the receive side: https://github.com/aws/s2n-tls/commit/2d5ddb3ae0bcc9b01d5d71906a604e489a56d547 It's not ideal, but it's an option.