smoltcp [RFC] Packet dispatch

Problem

Currently, sloltcp uses iteration to dispatch ingress packets to sockets and to poll sockets for egress packets. While acceptable on the platforms without libstd and memory allocation, this approach is very inefficient on systems where growable containers are available.

Discussion

The problem is two-fold: ingress packets should be dispatched by lookup tables based on their destination, while only "dirty" sockets should be polled for egress packets.
After #18 is merged, there will be two kinds of sockets: l3 (raw) sockets should be dispatched to based on packet's IP version and IP protocol number, and l4 sockets (tcp and udp) should be dispatched to based on the dst port.
Sockets can be bound to a specific IP address. As a result, a solution should support several sockets of the same type listening on the same port but bound to different IP addresses.
A solution should keep the existing behavior when compiled without the std feature.

Proposal

Dispatch packets to sockets based on the port/protocol number only, leaving the question of sockets being bound to IPs to the Socket::proccess method.
Expand SocketSet by (conditionally) adding three std::HashMap's to it, for raw, tcp and udp sockets, and a Vec to keep dirty sockets' handlers.
Hash map for raw sockets should have a type of std::HashMap<(RawSocketType, IpProtocol), Vec<Handle>>, hash maps for udp and tcp should have a type of std::HashMap<u16, Vec<Handle>> . Vec is required to support the possibility of several sockets listening to the same port/protocol on different addresses.
Extend SocketSet::add to use socket's endpoint and ip_protocol methods to construct an index to insert socket's handler into the corresponding lookup table.
Extend SocketSet::remove to remove socket from a lookup table based on socket's endpoint.
Extend Set::IterMut to support construction from a mutable reference to ManagedSlice and an iterator of Handles, producing an iterator of Socket.
Replace SocketSet::iter_mut with iter_raw(&mut self, ip_repr: &IpRepr) -> IterMut and iter_l4(&mut self, ip_repr: &IpRepr) -> IterMut, which should look up sockets based on ip_repr and return an iterator over the sockets found. With the std feature disabled, the functions should return iterators over the entire SocketSet.
Add iter_dirty to SocketSet to produce an iterator of Socket by draining dirty_sockets vector, or an iterator over the entire SocketSet if std is disabled.
Add mark_drity to SocketSet to mark a socket as dirty.
Replace iter_mut in EthernetInterface::poll with iter_raw and iter_l4 and in EthernetInterface::emit with iter_dirty.

Questions

The only breakage of the current public API would be the requirement to explicitly mark written-to sockets as dirty. This can be avoided by having SocketSet::get_mut mark each requested socket as dirty (which kinda defeats the purpose), by having a specialized method SocketSet::get_mut_for_write to do so, or by devising some smart handler type which would mark a socket as dirty on send.
I'm not sure if the reference counting in Set::Item is relevant here.
I think EthernetInterface::poll should be refactored into several methods to increase readability.

Jun 18 '17 15:06 batonius

Sockets can be bound to a specific IP address. As a result, a solution should support several sockets of the same type listening on the same port but bound to different IP addresses.

This already works, doesn't it?

while only "dirty" sockets should be polled for egress packets

I have an idea that doesn't involve even more conditional compilation. Add two storage items:

Every socket should gain a "dirty" flag, internally usable by SocketSet;
SocketSet should gain an additional storage area, sized the same as its socket list, structured as a ring buffer, containing indexes of sockets. Every time a clean socket is made dirty, an index of the dirty socket is added to that ring buffer. This makes it possible to do O(1) polling for egress packets.

Jun 18 '17 21:06 whitequark

Regarding ingress, using BTreeMap instead of a HashMap reduces the std requirement to a mere collections requirement, and makes it possible to do fast ingress handling on no_std code. Otherwise it does look like the only option to me.

Jun 18 '17 21:06 whitequark

This already works, doesn't it?

Yeah, I meant it for the packet dispatch mechanism, meaning a simple port -> socket map won't work.

I have an idea that doesn't involve even more conditional compilation

Right, we don't need a vector for this, we can use ManagedSlice.

Every time a clean socket is made dirty, an index of the dirty socket is added to that ring buffer.

How tho? Right now a Socket doesn't hold a reference back to the SocketSet it's in to report it's dirty and adding it would require Weak<RefCell<>> or something like this. I have some ideas I've sketched above but I don't like any of them.

using BTreeMap instead of a HashMap

Right, so far it looks like

#[cfg(any(feature = "std", feature = "collections"))]
#[derive(Debug)]
struct LookupTables {
    raw: BTreeMap<(RawPayloadType, IpProtocol), BTreeSet<usize>>,
    tcp: BTreeMap<u16, BTreeSet<usize>>,
    udp: BTreeMap<u16, BTreeSet<usize>>,
    dirty: Vec<usize>,
}

#[cfg(not(any(feature = "std", feature = "collections")))]
#[derive(Debug)]
struct LookupTables {}

#[derive(Debug)]
pub struct Set<'a, 'b: 'a, 'c: 'a + 'b> {
    sockets: ManagedSlice<'a, Option<Item<'b, 'c>>>,
    lookup_tables: LookupTables,
}

Jun 18 '17 22:06 batonius

OK, this looks more or less fine to me. A few suggestions:

Let's keep SocketSet and what you call LookupTables separate. Right now, with the BTreeSet<usize>, there is a potential ABA problem. If you use SocketHandles from the public API of SocketSet then it is not an issue.
Let's call LookupTables a smoltcp::routing::SocketRouter, since that's what it does: it routes packets to and from sockets. In the future I can see there being smoltcp::routing::NodeRouter, and they will probably share some data structures even.
The API consumer of smoltcp has to uphold the contract of SocketRouter to ensure that packets are promptly dispatched, that is, it has to set the dirty flag in SocketRouter at the same time as actually dirtying the socket. (This is somewhat ugly, but based on my experience wrapping smoltcp, it is not going to be a problem.) Therefore SocketRouter is external to all of: SocketSet, Socket (of any kind), and EthernetInterface. In fact I suggest that SocketRouter be passed into poll just like SocketSet.
Almost always, there will be exactly one socket per port in the entire SocketRouter since multi-address setups are rare. How about using (u16, IpAddress) (note the order) as the key of BTreeMap? This gets rid of allocations and indirections in to BTreeSet, which almost always are just wasting resources, but does not (I believe) pessimize the comparisons because almost always the comparison will return early after determining that the ports are not equal. Of course you still have to compare the IP address in the key but a 4/16 byte comparison should be cheaper than two pointer accesses.

Also, I encourage you to get early feedback by updating the PR in small pieces.

Jun 21 '17 04:06 whitequark

I don't like the idea of using routing in the names for packet dispatch. "Routing" is a well-defined concept and packet dispatch isn't usually a part of it. For starters, there are no "routes" involved :) I think something like smoltcp::dispatching::Dispatcher or DispatchTable would be a better name.

How about using (u16, IpAddress) (note the order) as the key of BTreeMap

This means we would have to do two searches, first with IpAddress set to the src_ip from the packet to search for an address-bound socket and then, if it fails (and it will more often than not), with IpAddress::Undefined to search for a not bound one. I'm not against it tho, it may be better from the API pov, since we would return an Option<Handler> and not an iterator.

Also, I think I'll make two PRs, since ingress and egress dispatching are mostly independent.

Jun 21 '17 10:06 batonius

This means we would have to do two searches, first with IpAddress set to the src_ip from the packet to search for an address-bound socket and then, if it fails (and it will more often than not), with IpAddress::Undefined to search for a not bound one.

Nope! We can use BTreeMap::range to perform just one search.

Jun 21 '17 18:06 whitequark

I don't like the idea of using routing in the names for packet dispatch. "Routing" is a well-defined concept and packet dispatch isn't usually a part of it. For starters, there are no "routes" involved :) I think something like smoltcp::dispatching::Dispatcher or DispatchTable would be a better name.

Agreed. On further thinking smoltcp::socket::SocketDispatcher sounds good, since it is functionally adjacent to smoltcp::socket::SocketSet in every way.

Jun 21 '17 19:06 whitequark

Nope! We can use BTreeMap::range to perform just one search.

I don't get it. Suppose you have three sockets, *:123, 192.168.1.1:123 and 192.168.1.3:123. They are stored in the BTreeMap in this order. How would you construct a range to find the right socket for each address in 192.168.1.1..192.168.1.4?

Jun 21 '17 22:06 batonius

How would you construct a range to find the right socket for each address in 192.168.1.1..192.168.1.4?

Why do you need this?

Jun 21 '17 22:06 whitequark

So here's my current attempt: https://github.com/m-labs/smoltcp/compare/master...batonius:packet_dispatch . I've spent some time on these lifetimes :)

What I'm thinking of doing next is

Add process_repr methods to all the sockets which would be the same as process but would accept tcp/udp/raw reprs.
Add l4 header parsing to EthernetInterface::poll, call get_tcp_sockets or get_udp_sockets depending on l4, and loop over the resulting iterator + filter_map(<Socket as AsSocket<TcpSocket>>::try_as_socket) calling process_repr on the sockets. This way we won't be parsing l4 headers two times.

I'm not sure I like SocketDispater being a separate entity from the API pov, the usage in the tests isn't quite friendly. You have to first add a socket to a SocketSet and then not forget to add it again to a Dispatcher, this time requesting it from the SocketSet. If we ignore egress packets for now, hiding Dispatcher inside SocketSet looks like a clean and reasonable solution.

Jun 22 '17 22:06 batonius

Add process_repr methods to all the sockets which would be the same as process but would accept tcp/udp/raw reprs.

You mean split process into the part that parses and accepts/rejects the packet and the rest? Hm.

Jun 22 '17 23:06 whitequark

Yes. Basically, we now need to parse l4 headers in EthernetInterface::poll to get the dst port to dispatch the packet anyway, so there's no reason to parse it again in Socket::proccess. These methods will be socket-specific, but we can iterate over specific sockets only with filter_map.

I'm not sure about the accept/reject part tho, since without std/collections we're still gonna iterate over all the sockets so we can't just assume the packet is for the socket in these new methods. We can have separate versions for std/no_std, but I'm not sure if it's worth it since the accept/reject check is just a couple of u32/u16 comparisons.

Jun 22 '17 23:06 batonius

I'm not happy about this design. How about we factor out the IP handling machinery from EthernetInterface and then keep all of our L4 handling code there?

Jun 23 '17 00:06 whitequark

I'm all for it, I've stated EthernetInterface::poll as an issue in the OP :)

I propose I rename that I called SocketDispatcher to SocketDispatchTable, put it inside SocketSet, add iface/ip_dispatcher.rs with IpDispatcher and move all the IP processing logic (everything from the EthernetProtocol::Ipv4 branch) to it, refactoring it into several methods in the process, while keeping the EthernetInterface's interface intact. Again, I'm explicitly ignoring egress packets for now.

Jun 23 '17 07:06 batonius

move all the IP processing logic (everything from the EthernetProtocol::Ipv4 branch) to it, refactoring it into several methods in the process, while keeping the EthernetInterface's interface intact

Let's start with this. It's long overdue, will have a lot of benefits, and necessary for this anyway.

Jun 23 '17 07:06 whitequark

Here's my attempt at refactoring EthernetInterface::poll: https://github.com/m-labs/smoltcp/compare/master...batonius:poll_refactor . I've tried to keep the logic intact, but I've made two changes I think are reasonable:

We don't try to find a socket to process a packet with non-icmp/tcp/udp proto. Instead, we just send PROTO_UNREACHABLE If no raw socket processed it successfully.
We end the iteration over tcp/udp sockets after the first non-rejected one.

I haven't factored it out into a separate module as I was going to because these functions either have no state of their own or rely on EthernetInterface's internal state. With these changes, it should be trivial to plug in the socket dispatcher just by replacing iter_mut().filter_map with specialized search methods.

If you're ok with it, I'll make a PR so I can move on with my packet_dispatch branch.

Jun 25 '17 19:06 batonius

Looks all good! I have a few nits but no real issues.

Jun 26 '17 04:06 whitequark

In fact I went ahead and merged it.

Jun 26 '17 05:06 whitequark

I propose I rename that I called SocketDispatcher to SocketDispatchTable, put it inside SocketSet

Ack on putting it into SocketSet.

Jun 26 '17 06:06 whitequark

Moving forward, do you think you can cover EthernetInterface with tests? A basic coverage of every Ok and Err returned from the process_* functions would be a great start, I'll chime in then, adding support for the newly landed rustc support for gcov.

Jun 26 '17 06:06 whitequark

Moving forward, do you think you can cover EthernetInterface with tests?

Sure, I'll look into it.

Jun 26 '17 07:06 batonius

Something I just remembered that might be very relevant to your work is that having several open sockets in LISTEN state with the same local endpoint is perfectly legal. That's how listen backlog is implemented (by a layer on top of smoltcp).

Jun 26 '17 13:06 whitequark

Right, I somehow missed that point completely, it's not enough to dispatch tcp packets based on the dst endpoint, a server can have several clients connected to it, we need to consider src endpoint as well, and we don't know it until we established a connection. This means we need another layer of dispatching and a way for a socket to report the fact it has established a connection to a remote endpoint.

Jun 26 '17 13:06 batonius

Now I think if it, it should be easy enough to do by checking if socket's remote_endpoit has changed after process in process_tcpv4.

Jun 26 '17 14:06 batonius

Yeah that works.

Jun 26 '17 19:06 whitequark

I've updated https://github.com/m-labs/smoltcp/compare/master...batonius:packet_dispatch .

I'm not sure about the API yet, but my current plan is to add a smart pointer-like type that would hold mutable references both to Socket and DispatchTable and would implement Deref<Target=Socket> . SocketSet::get would return this smart pointer instead of a raw reference. The pointer should be able to check if Socket's endpoints have changed in drop, and if they have, update DispatcheTable accordingly. This way it would be possible to add an unbound socket to SocketSet and then to automatically add it to DispatchTable on bind or connect.

I'm also thinking of overriding bind and likes in this smart pointer so it could return errors like AlreadyInUse, which is not possible now.

Jun 26 '17 22:06 batonius

Hmm, I'm not sure if I like this idea very much, we already have drop magic in Device and that's pretty bad already. But I can give it a look.

Jun 26 '17 22:06 whitequark

I've got it working: https://github.com/m-labs/smoltcp/compare/master...batonius:packet_dispatch .

I'm not sure about names, and I think there are some public API ready to be cut out, there are no tests, the formatting is wrong, but overall this is the design I propose we adopt. Examples appear to be working, but I can't test them in no_std mode.

Smart-pointer approach works fine, I've even managed to implement it without RefCell, API changes are minimal, no_std mode is supported, adding egress dispatching should be easy.

Jun 28 '17 22:06 batonius

I've cleaned up the code, and I'm more or less happy with it. If you're ok with the approach I've taken, I'll create a PR so you can review the code and request changes.

Jun 29 '17 21:06 batonius

Added egress packet dispatching, the only thing I'm not sure about is the best way to determine if a TcpSocket is dirty. Right now I consider every TCP socket not in the CLOSED or LISTEN states as dirty, but limit the number of iterations over the dirty sockets list so we don't get an infinite loop. Overwise it appears to be working.

Jul 02 '17 14:07 batonius