eth10g icon indicating copy to clipboard operation
eth10g copied to clipboard

Routing tables: A proposal

Open ZipCPU opened this issue 11 months ago • 14 comments

While the current routing algorithm used by the eth10g controller is blazing fast, is not very configurable. Worse, it requires a lot of logic to manage. Further, the current routing method will only work on Ethernet headers, not IPv3 or IPv4 addressing, or IP/UDP or IP/TCP protocol information (ports). For this reason, a highly configurable memory-based routing table is of interest.

Many routing methods exist in the literature, often parameterized by how the table is structured. A user configurable structure, which can be optimized in many ways, might therefore be valuable.

I therefore propose the following routing table approach. It's based upon 32-bit "instruction" words, which will feed a small FSM based router.

The router shall maintain 3 registers: a "program counter", controlling where the next table word is read from, a "shift register", containing a 128b window into the current packet, and a 128b accumulator containing the results from comparing table values with the 128b window into the current packet. Once comparisons are complete, the router can either 1) jump to a new instruction on a match, 2) jump to a new instruction on a failed match, 3) route to a given address a) always, b) on a match, or c) on a failed match.

The current draft instruction set is shown below:

tblrouter

MAC Example

To match against an Ethernet MAC 86:33:24:17:bf:dd and send it to target "8", the table would have instructions:

CMP (shift=12, IMM=0x863324) -- match against the first 24 bits
OR (shift=9, IMM=0x17bfdd) -- continue matching the second set of 24 bits
RZ 8 -- on "zero" (i.e. a match), route this packet to target "8"

IP Example

To match against an IP packet and route subnetwork 192.168.3.X to target 3, the table would use two steps. The first would determine if the packet were (or were not) an IP packet:

CMP (shift=2, IMM=0x0800)
MASK (shift=2, IMM=0x0ffff) -- make sure bits [23:16] of the prior command don't affect the comparison
JZ (ip_table) -- to jump to an IPv4 processing table

The IP table would then have:

STEP 14 (Shift 14 bytes through, bypassing the rest of the ethernet header)
CMP (shift=1, IMM=192.168.3) (Compare the IPv4 destination, bits [8 +: 24], against the given address)
RZ 3 (Route on a successful comparison to target "3")

Routing to a second destination, such as 192.168.17.X, to target 1 would only take two further instructions:

CMP (shift=1, IMM=192.168.17)  (Compare bits[8 +: 24] against the given destination address)
RZ 1 (Route on zero to target 1)

All other packets might then be routed to target 0.

RA 0. (Route Always to target zero)

In these examples, the "targets" will be defined with local meaning. For example, target zero might be set up as a catch all to be sent to a CPU for inspection, and potentially updating the routing table.

Performance monitoring

Performance counters are maintained within the table. Upon hitting a performance counter, the performance counter instruction will be re-read, and atomically incremented. How this happens will be dependent upon the bus in use. For example, on WB, the Wishbone cycle line will be held high while the performance counter is ready, a one added to it, and then the counter is written back to memory. With AXI, an exclusive access operation can be used to guarantee an atomic increment. This helps to insure that both CPU and router, and potentially other concurrent routers, will always have a consistent view of the performance counters.

Potential issues

The most obvious "issue" with this router is that it is memory based. Memory will need to be read to determine the optimal routing. While this is taking place, the packet will be forced to "stall" and wait for routing to complete. As long as packets are stored in virtual FIFOs (as they are within this design), such stalls shouldn't cause packets to drop, but they will delay packets moving through the system.

Because this approach is designed to look and feel like an instruction stream, a standard CPU instruction fetch may be used to feed it. This lends open the possibility of repurposing a pre-built cache.

As with any CPU, branches will cause pipeline stalls and hence delays.

There is no requirement in this proposal that any specific type of memory be used. The memory used could therefore come from on-chip memory, or equivalently an off-chip SDRAM type of memory, with expected performance advantages and disadvantages naturally following.

Conclusions

This table based "instruction-set" solution solves several problems with the current approach, and so I offer it here for anyone to review and/or discuss.

Dan

ZipCPU avatar Nov 12 '24 17:11 ZipCPU