[RFC] Packet dispatch
Problem
Currently, sloltcp uses iteration to dispatch ingress packets to sockets and to poll sockets for egress packets. While acceptable on the platforms without libstd and memory allocation, this approach is very inefficient on systems where growable containers are available.
Discussion
- The problem is two-fold: ingress packets should be dispatched by lookup tables based on their destination, while only "dirty" sockets should be polled for egress packets.
- After #18 is merged, there will be two kinds of sockets: l3 (raw) sockets should be dispatched to based on packet's IP version and IP protocol number, and l4 sockets (tcp and udp) should be dispatched to based on the dst port.
- Sockets can be bound to a specific IP address. As a result, a solution should support several sockets of the same type listening on the same port but bound to different IP addresses.
- A solution should keep the existing behavior when compiled without the
stdfeature.
Proposal
- Dispatch packets to sockets based on the port/protocol number only, leaving the question of sockets being bound to IPs to the
Socket::proccessmethod. - Expand
SocketSetby (conditionally) adding threestd::HashMap's to it, for raw, tcp and udp sockets, and aVecto keep dirty sockets' handlers. - Hash map for raw sockets should have a type of
std::HashMap<(RawSocketType, IpProtocol), Vec<Handle>>, hash maps for udp and tcp should have a type ofstd::HashMap<u16, Vec<Handle>>.Vecis required to support the possibility of several sockets listening to the same port/protocol on different addresses. - Extend
SocketSet::addto use socket'sendpointandip_protocolmethods to construct an index to insert socket's handler into the corresponding lookup table. - Extend
SocketSet::removeto remove socket from a lookup table based on socket'sendpoint. - Extend
Set::IterMutto support construction from a mutable reference toManagedSliceand an iterator ofHandles, producing an iterator of Socket. - Replace
SocketSet::iter_mutwithiter_raw(&mut self, ip_repr: &IpRepr) -> IterMutanditer_l4(&mut self, ip_repr: &IpRepr) -> IterMut, which should look up sockets based on ip_repr and return an iterator over the sockets found. With thestdfeature disabled, the functions should return iterators over the entireSocketSet. - Add
iter_dirtytoSocketSetto produce an iterator of Socket by drainingdirty_socketsvector, or an iterator over the entireSocketSetifstdis disabled. - Add
mark_dritytoSocketSetto mark a socket as dirty. - Replace
iter_mutinEthernetInterface::pollwithiter_rawanditer_l4and inEthernetInterface::emitwithiter_dirty.
Questions
- The only breakage of the current public API would be the requirement to explicitly mark written-to sockets as dirty. This can be avoided by having
SocketSet::get_mutmark each requested socket as dirty (which kinda defeats the purpose), by having a specialized methodSocketSet::get_mut_for_writeto do so, or by devising some smart handler type which would mark a socket as dirty onsend. - I'm not sure if the reference counting in
Set::Itemis relevant here. - I think
EthernetInterface::pollshould be refactored into several methods to increase readability.
Sockets can be bound to a specific IP address. As a result, a solution should support several sockets of the same type listening on the same port but bound to different IP addresses.
This already works, doesn't it?
while only "dirty" sockets should be polled for egress packets
I have an idea that doesn't involve even more conditional compilation. Add two storage items:
- Every socket should gain a "dirty" flag, internally usable by SocketSet;
- SocketSet should gain an additional storage area, sized the same as its socket list, structured as a ring buffer, containing indexes of sockets. Every time a clean socket is made dirty, an index of the dirty socket is added to that ring buffer. This makes it possible to do O(1) polling for egress packets.
Regarding ingress, using BTreeMap instead of a HashMap reduces the std requirement to a mere collections requirement, and makes it possible to do fast ingress handling on no_std code. Otherwise it does look like the only option to me.
This already works, doesn't it?
Yeah, I meant it for the packet dispatch mechanism, meaning a simple port -> socket map won't work.
I have an idea that doesn't involve even more conditional compilation
Right, we don't need a vector for this, we can use ManagedSlice.
Every time a clean socket is made dirty, an index of the dirty socket is added to that ring buffer.
How tho? Right now a Socket doesn't hold a reference back to the SocketSet it's in to report it's dirty and adding it would require Weak<RefCell<>> or something like this. I have some ideas I've sketched above but I don't like any of them.
using BTreeMap instead of a HashMap
Right, so far it looks like
#[cfg(any(feature = "std", feature = "collections"))]
#[derive(Debug)]
struct LookupTables {
raw: BTreeMap<(RawPayloadType, IpProtocol), BTreeSet<usize>>,
tcp: BTreeMap<u16, BTreeSet<usize>>,
udp: BTreeMap<u16, BTreeSet<usize>>,
dirty: Vec<usize>,
}
#[cfg(not(any(feature = "std", feature = "collections")))]
#[derive(Debug)]
struct LookupTables {}
#[derive(Debug)]
pub struct Set<'a, 'b: 'a, 'c: 'a + 'b> {
sockets: ManagedSlice<'a, Option<Item<'b, 'c>>>,
lookup_tables: LookupTables,
}
OK, this looks more or less fine to me. A few suggestions:
- Let's keep
SocketSetand what you callLookupTablesseparate. Right now, with theBTreeSet<usize>, there is a potential ABA problem. If you useSocketHandles from the public API ofSocketSetthen it is not an issue. - Let's call
LookupTablesasmoltcp::routing::SocketRouter, since that's what it does: it routes packets to and from sockets. In the future I can see there beingsmoltcp::routing::NodeRouter, and they will probably share some data structures even. - The API consumer of smoltcp has to uphold the contract of
SocketRouterto ensure that packets are promptly dispatched, that is, it has to set the dirty flag inSocketRouterat the same time as actually dirtying the socket. (This is somewhat ugly, but based on my experience wrapping smoltcp, it is not going to be a problem.) ThereforeSocketRouteris external to all of:SocketSet,Socket(of any kind), andEthernetInterface. In fact I suggest thatSocketRouterbe passed intopolljust likeSocketSet. - Almost always, there will be exactly one socket per port in the entire
SocketRoutersince multi-address setups are rare. How about using(u16, IpAddress)(note the order) as the key ofBTreeMap? This gets rid of allocations and indirections in toBTreeSet, which almost always are just wasting resources, but does not (I believe) pessimize the comparisons because almost always the comparison will return early after determining that the ports are not equal. Of course you still have to compare the IP address in the key but a 4/16 byte comparison should be cheaper than two pointer accesses.
Also, I encourage you to get early feedback by updating the PR in small pieces.
I don't like the idea of using routing in the names for packet dispatch. "Routing" is a well-defined concept and packet dispatch isn't usually a part of it. For starters, there are no "routes" involved :) I think something like smoltcp::dispatching::Dispatcher or DispatchTable would be a better name.
How about using (u16, IpAddress) (note the order) as the key of BTreeMap
This means we would have to do two searches, first with IpAddress set to the src_ip from the packet to search for an address-bound socket and then, if it fails (and it will more often than not), with IpAddress::Undefined to search for a not bound one. I'm not against it tho, it may be better from the API pov, since we would return an Option<Handler> and not an iterator.
Also, I think I'll make two PRs, since ingress and egress dispatching are mostly independent.
This means we would have to do two searches, first with IpAddress set to the src_ip from the packet to search for an address-bound socket and then, if it fails (and it will more often than not), with IpAddress::Undefined to search for a not bound one.
Nope! We can use BTreeMap::range to perform just one search.
I don't like the idea of using routing in the names for packet dispatch. "Routing" is a well-defined concept and packet dispatch isn't usually a part of it. For starters, there are no "routes" involved :) I think something like smoltcp::dispatching::Dispatcher or DispatchTable would be a better name.
Agreed. On further thinking smoltcp::socket::SocketDispatcher sounds good, since it is functionally adjacent to smoltcp::socket::SocketSet in every way.
Nope! We can use BTreeMap::range to perform just one search.
I don't get it. Suppose you have three sockets, *:123, 192.168.1.1:123 and 192.168.1.3:123. They are stored in the BTreeMap in this order. How would you construct a range to find the right socket for each address in 192.168.1.1..192.168.1.4?
How would you construct a range to find the right socket for each address in 192.168.1.1..192.168.1.4?
Why do you need this?
So here's my current attempt: https://github.com/m-labs/smoltcp/compare/master...batonius:packet_dispatch . I've spent some time on these lifetimes :)
What I'm thinking of doing next is
- Add
process_reprmethods to all the sockets which would be the same asprocessbut would accept tcp/udp/raw reprs. - Add l4 header parsing to
EthernetInterface::poll, callget_tcp_socketsorget_udp_socketsdepending on l4, and loop over the resulting iterator +filter_map(<Socket as AsSocket<TcpSocket>>::try_as_socket)callingprocess_repron the sockets. This way we won't be parsing l4 headers two times.
I'm not sure I like SocketDispater being a separate entity from the API pov, the usage in the tests isn't quite friendly. You have to first add a socket to a SocketSet and then not forget to add it again to a Dispatcher, this time requesting it from the SocketSet. If we ignore egress packets for now, hiding Dispatcher inside SocketSet looks like a clean and reasonable solution.
Add process_repr methods to all the sockets which would be the same as process but would accept tcp/udp/raw reprs.
You mean split process into the part that parses and accepts/rejects the packet and the rest? Hm.
Yes. Basically, we now need to parse l4 headers in EthernetInterface::poll to get the dst port to dispatch the packet anyway, so there's no reason to parse it again in Socket::proccess. These methods will be socket-specific, but we can iterate over specific sockets only with filter_map.
I'm not sure about the accept/reject part tho, since without std/collections we're still gonna iterate over all the sockets so we can't just assume the packet is for the socket in these new methods. We can have separate versions for std/no_std, but I'm not sure if it's worth it since the accept/reject check is just a couple of u32/u16 comparisons.
I'm not happy about this design. How about we factor out the IP handling machinery from EthernetInterface and then keep all of our L4 handling code there?
I'm all for it, I've stated EthernetInterface::poll as an issue in the OP :)
I propose I rename that I called SocketDispatcher to SocketDispatchTable, put it inside SocketSet, add iface/ip_dispatcher.rs with IpDispatcher and move all the IP processing logic (everything from the EthernetProtocol::Ipv4 branch) to it, refactoring it into several methods in the process, while keeping the EthernetInterface's interface intact. Again, I'm explicitly ignoring egress packets for now.
move all the IP processing logic (everything from the EthernetProtocol::Ipv4 branch) to it, refactoring it into several methods in the process, while keeping the EthernetInterface's interface intact
Let's start with this. It's long overdue, will have a lot of benefits, and necessary for this anyway.
Here's my attempt at refactoring EthernetInterface::poll: https://github.com/m-labs/smoltcp/compare/master...batonius:poll_refactor . I've tried to keep the logic intact, but I've made two changes I think are reasonable:
- We don't try to find a socket to process a packet with non-icmp/tcp/udp proto. Instead, we just send
PROTO_UNREACHABLEIf no raw socket processed it successfully. - We end the iteration over tcp/udp sockets after the first non-rejected one.
I haven't factored it out into a separate module as I was going to because these functions either have no state of their own or rely on EthernetInterface's internal state. With these changes, it should be trivial to plug in the socket dispatcher just by replacing iter_mut().filter_map with specialized search methods.
If you're ok with it, I'll make a PR so I can move on with my packet_dispatch branch.
Looks all good! I have a few nits but no real issues.
In fact I went ahead and merged it.
I propose I rename that I called SocketDispatcher to SocketDispatchTable, put it inside SocketSet
Ack on putting it into SocketSet.
Moving forward, do you think you can cover EthernetInterface with tests? A basic coverage of every Ok and Err returned from the process_* functions would be a great start, I'll chime in then, adding support for the newly landed rustc support for gcov.
Moving forward, do you think you can cover EthernetInterface with tests?
Sure, I'll look into it.
Something I just remembered that might be very relevant to your work is that having several open sockets in LISTEN state with the same local endpoint is perfectly legal. That's how listen backlog is implemented (by a layer on top of smoltcp).
Right, I somehow missed that point completely, it's not enough to dispatch tcp packets based on the dst endpoint, a server can have several clients connected to it, we need to consider src endpoint as well, and we don't know it until we established a connection. This means we need another layer of dispatching and a way for a socket to report the fact it has established a connection to a remote endpoint.
Now I think if it, it should be easy enough to do by checking if socket's remote_endpoit has changed after process in process_tcpv4.
Yeah that works.
I've updated https://github.com/m-labs/smoltcp/compare/master...batonius:packet_dispatch .
I'm not sure about the API yet, but my current plan is to add a smart pointer-like type that would hold mutable references both to Socket and DispatchTable and would implement Deref<Target=Socket> . SocketSet::get would return this smart pointer instead of a raw reference. The pointer should be able to check if Socket's endpoints have changed in drop, and if they have, update DispatcheTable accordingly. This way it would be possible to add an unbound socket to SocketSet and then to automatically add it to DispatchTable on bind or connect.
I'm also thinking of overriding bind and likes in this smart pointer so it could return errors like AlreadyInUse, which is not possible now.
Hmm, I'm not sure if I like this idea very much, we already have drop magic in Device and that's pretty bad already. But I can give it a look.
I've got it working: https://github.com/m-labs/smoltcp/compare/master...batonius:packet_dispatch .
I'm not sure about names, and I think there are some public API ready to be cut out, there are no tests, the formatting is wrong, but overall this is the design I propose we adopt. Examples appear to be working, but I can't test them in no_std mode.
Smart-pointer approach works fine, I've even managed to implement it without RefCell, API changes are minimal, no_std mode is supported, adding egress dispatching should be easy.
I've cleaned up the code, and I'm more or less happy with it. If you're ok with the approach I've taken, I'll create a PR so you can review the code and request changes.
Added egress packet dispatching, the only thing I'm not sure about is the best way to determine if a TcpSocket is dirty. Right now I consider every TCP socket not in the CLOSED or LISTEN states as dirty, but limit the number of iterations over the dirty sockets list so we don't get an infinite loop. Overwise it appears to be working.