embedded-nal
embedded-nal copied to clipboard
TCP socket state/error handling insufficiently specified
The correspondence of the TcpClientStack API to TCP socket states is currently not specified clearly enough to be able to write robust TCP client applications against the generic embedded-nal interface. For instance:
- What does
is_connectedmean? Thatconnecthas been called on the socket? That the socket is ESTABLISHED in TCP terms? That there might still be data toreceive()? (where the socket could already be in CLOSE-WAIT, but still with data in the receive buffer) - What does
connect()returningOk()instead ofWouldBlockmean?is_connected() == trueimmediately afterwards? - What happens if the socket is closed gracefully on the remote side? Does
receive()returnSome(0)like in the Berkeley sockets API? Or does it return some sort of error? If the latter, how does a client application differentiate between that and unexpected errors?
This is complexity inherent to the problem domain, and needs to be (carefully) exposed in any interface abstracting TCP sockets. Otherwise, it is simply not possible to write robust applications using embedded-nal. For instance, even in the common case where an embedded device just wants to try reconnecting to some server when there are some sort of connection problems, without any care for the exact state of the previous connection, is it okay to try and connect the socket again after is_connected returns false, or is a new socket() necessary?
For an example of a library that has (save for a few typos) done a decent job at this, see https://docs.rs/smoltcp/0.7.0/smoltcp/socket/struct.TcpSocket.html.
This sounds like a very valid point!
Do you have anything in mind that might fix this? I have only used the embedded-nal with offloaded IP-stacks based on AT-commands, in which case some of the usual socket states kinda disappear, because they are handled by the co-processor.
To add to this:
- What happens when close() is called? Is it just the send half which is closed, so it's still possible to receive? or the entire socket is aborted? Does it send FIN or RST?
- Does written but not yet sent data still get sent after close()? or does it get discarded?
A common issue is when doing data transfers by writing the data then gracefully closing the socket (with a FIN). The sending side must flush all the data then send the FIN. The receiving side must be able to distinguish a close by FIN (data is complete) from a close by RST or timeout (data may be truncated). See https://github.com/smoltcp-rs/smoltcp/issues/349
From my experience coprocessor TCP stacks always get all this horribly wrong. I've seen such AT command stacks silently drop data, both on the sending and receiving side. And most don't expose all these needed TCP state details. You simply can't rely on this, all you can know is "socket is ESTABLISHED, or it's... in whatever other state, who knows!".
In my experience the only way to write reliable software with such stacks is to write your own framing with checksums and retries, which kinda defeats the point of TCP.
Or don't use the modem's stack at all, and use PPP + smoltcp :)
save for a few typos
PRs welcome ;)
I would love nothing more than to change our co processors shitty TCP stack, in favor of smoltcp. But i doubt it will actually happen until we have an embedded friendly tls stack as well.. can't wait for rustls to gain nostd & noalloc at some point.. it will be a true game changer for embedded rust in IoT applications