embedded-nal TCP socket state/error handling insufficiently specified

The correspondence of the TcpClientStack API to TCP socket states is currently not specified clearly enough to be able to write robust TCP client applications against the generic embedded-nal interface. For instance:

What does is_connected mean? That connect has been called on the socket? That the socket is ESTABLISHED in TCP terms? That there might still be data to receive()? (where the socket could already be in CLOSE-WAIT, but still with data in the receive buffer)
What does connect() returning Ok() instead of WouldBlock mean? is_connected() == true immediately afterwards?
What happens if the socket is closed gracefully on the remote side? Does receive() return Some(0) like in the Berkeley sockets API? Or does it return some sort of error? If the latter, how does a client application differentiate between that and unexpected errors?

This is complexity inherent to the problem domain, and needs to be (carefully) exposed in any interface abstracting TCP sockets. Otherwise, it is simply not possible to write robust applications using embedded-nal. For instance, even in the common case where an embedded device just wants to try reconnecting to some server when there are some sort of connection problems, without any care for the exact state of the previous connection, is it okay to try and connect the socket again after is_connected returns false, or is a new socket() necessary?

For an example of a library that has (save for a few typos) done a decent job at this, see https://docs.rs/smoltcp/0.7.0/smoltcp/socket/struct.TcpSocket.html.

Mar 07 '21 04:03 dnadlinger

This sounds like a very valid point!

Do you have anything in mind that might fix this? I have only used the embedded-nal with offloaded IP-stacks based on AT-commands, in which case some of the usual socket states kinda disappear, because they are handled by the co-processor.

Mar 08 '21 07:03 MathiasKoch

To add to this:

What happens when close() is called? Is it just the send half which is closed, so it's still possible to receive? or the entire socket is aborted? Does it send FIN or RST?
Does written but not yet sent data still get sent after close()? or does it get discarded?

A common issue is when doing data transfers by writing the data then gracefully closing the socket (with a FIN). The sending side must flush all the data then send the FIN. The receiving side must be able to distinguish a close by FIN (data is complete) from a close by RST or timeout (data may be truncated). See https://github.com/smoltcp-rs/smoltcp/issues/349

From my experience coprocessor TCP stacks always get all this horribly wrong. I've seen such AT command stacks silently drop data, both on the sending and receiving side. And most don't expose all these needed TCP state details. You simply can't rely on this, all you can know is "socket is ESTABLISHED, or it's... in whatever other state, who knows!".

In my experience the only way to write reliable software with such stacks is to write your own framing with checksums and retries, which kinda defeats the point of TCP.

Or don't use the modem's stack at all, and use PPP + smoltcp :)

save for a few typos

PRs welcome ;)

Mar 18 '21 17:03 Dirbaio

I would love nothing more than to change our co processors shitty TCP stack, in favor of smoltcp. But i doubt it will actually happen until we have an embedded friendly tls stack as well.. can't wait for rustls to gain nostd & noalloc at some point.. it will be a true game changer for embedded rust in IoT applications

Mar 18 '21 18:03 MathiasKoch

embedded-nal embedded-nal copied to clipboard

TCP socket state/error handling insufficiently specified

embedded-nal
embedded-nal copied to clipboard