smoltcp icon indicating copy to clipboard operation
smoltcp copied to clipboard

Better buffer structures for TCP sockets

Open jordens opened this issue 5 years ago • 13 comments
trafficstars

Currently the SocketBuffer for the RX and TX side of the TCP sockets are RingBuffers. This is unfortunate since they will give access to sub-optimal partial chunks once the data/space spans the wrap-around. The latter happens often in practice (at roughly packet/message size divided by buffer size). There is currently neither a smoltcp API to get access to the entire free space (for send()) or buffered data (for recv()) nor would any higher protocol layer typically support such an API (you'd have to deal with two chunks). Since the upper protocol needs to always recv/send the partial chunk up to the wrap-around before it will get access to the remaining chunk, it will need to buffer and copy. It shouldn't need to do that since there is either more space/data available or it can get there by merely waiting a bit and trying again later if the available space/data is insufficient.

There seem to better data structures that would allow protocols on top of TCP to be implemented without having to buffer again at the protocol layer. Maybe bip buffers are such a structure. The interface wouldn't even be very different from RingBuffer. I don't know what other stacks are using.

jordens avatar Jun 24 '20 09:06 jordens

Wouldn't bip buffers require paging and use unsafe code? I'm not sure I like having either of those in smoltcp core, since being written in 100% safe Rust and not requiring anything weird from the platform are two of its selling points.

Having those as an option might be OK, but I'm also not sure if there is a way to do that cleanly.

whitequark avatar Jun 24 '20 09:06 whitequark

A possible alternative is an API that uses a list of chunks instead of a single chunk, similar to iovec.

whitequark avatar Jun 24 '20 09:06 whitequark

Is there really no unsafe code in the existing dependencies? byteorder seems to have quite some. I suspect the bbqueue or spsc-bip-buffer authors would say they do it cleanly and that the unsafe code is fully analyzed and sound. I have no idea about iovec. Would it also allow you to just wait and try again later if the available space isn't large enough yet? An API that gives you just one, ultimately large enough, chunk to handle (to serialize your packet into) is seems very appealing.

jordens avatar Jun 24 '20 10:06 jordens

Is there really no unsafe code in the existing dependencies? byteorder seems to have quite some. I suspect the bbqueue or spsc-bip-buffer authors would say they do it cleanly and that the unsafe code is fully analyzed and sound.

Fair point. We can do that.

An API that gives you just one, ultimately large enough, chunk to handle (to serialize your packet into) is seems very appealing.

I gave bbqueue a quick look, seems OK to me overall. Can we make it an opt-in feature without breaking the overall API? If so (seems like we can, but maybe I'm missing something), I'm happy to merge this.

whitequark avatar Jun 24 '20 10:06 whitequark

Yes I think this wouldn't change any API and it can be optional. But let's get some more input on this. I don't feel like the expert here.

jordens avatar Jun 24 '20 10:06 jordens

Just to add to this discussion - we're working on an MQTT implementation on top of smoltcp which uses a TCP socket as the underlying transport mechanism.

From our point of view, a bip buffer would play very nicely. Potentially, our API would transform from

  1. Construct MQTT packet in local buffer (stack/heap)
  2. Write data to TCP socket
  3. (Internally) copy packet buffer into TCP socket buffer

into:

  1. Request buffer from the TCP socket
  2. Construct MQTT packet in-place in the requested buffer
  3. Commit changes to the TCP socket

This has a number of advantages:

  1. We don't have to worry about potentially writing half a packet to the TCP socket due to insufficient space in the underlying socket (Although this can be done with a check on the available size, we're using a generic trait implementation of embedded-nal, which doesn't allow us to check the underlying buffer size)
  2. We eliminate a copy of the data, which speeds up execution of the protocol, which increases throughput.

I'd be happy to look into implementation on this if there's interest.

ryan-summers avatar Jun 29 '20 11:06 ryan-summers

Not to detract from the (potential) merits of contiguous buffers and avoiding copies, but I don't quite see the issue with

We don't have to worry about potentially writing half a packet to the TCP socket.

After all, TCP is stream-based, not message-based, so receivers can deal just fine with the second half arriving a while later when there was buffer space available for it to be sent.

dnadlinger avatar Jul 18 '20 16:07 dnadlinger

@dnadlinger From a higher-level perspective, this means the layer above the TCP socket then has to keep state about the half-a-packet it sent (if the high-level protocol doesn't support invalid packet correction/detection, or if the half-packet causes some form of sync-loss). This then forces the upper layer to keep a buffer for the packet around, even if it doesn't necessarily want to.

Basically, with the proposed implementation, the higher level layer knows in advance whether or not it can send the datagram it wants to. It doesn't have to use a "try-then-handle" if it wasn't able to do the whole process.

ryan-summers avatar Jul 19 '20 06:07 ryan-summers