snabb
snabb copied to clipboard
Cleanup without supervisor trees
It seems in Snabb that we have been moving towards an Erlang-like cleanup model where there is a hierarchy of processes that get notified when "lower" processes exit (however they exit), and in which the "higher" processes is somehow more responsible and can handle cleanup. See https://github.com/snabbco/snabb/blob/master/src/core/main.lua#L175 and specifically the shutdown() call there.
In the Igalia/lwaftr branch, we incorporated @lukego's multiprocess work from #1021. Luke was doing this work in the context of a device driver in which the main process would set up queues and allocate packets and then distribute those queues to worker processes. Luke needed a reliable way to restore PCI device bus mastering settings, so he added some code (https://github.com/snabbco/snabb/pull/1021/files#diff-1bad3916084abe11aeb847aa87fefcccR148) to make the supervisor process disable bus mastering if needed, for example if the worker process exited unexpectedly.
This turns out to be at the root of a problem now, attempting to merge Igalia/lwaftr back to master; see https://github.com/snabbco/snabb/pull/1133#issuecomment-301839853. The intel_mp tests were failing, and rightly so. Nice work @petebristow for creating these tests and to @eugeneia to making sure that they actually ran and gated merges to master. The issue is that the intel_mp model isn't structured as a tree of processes; the process-that-turned-on-bus-mastering-is-the-one-to-turn-it-off logic doesn't work as the expectation is that queue handlers come and go at will (I think).
My question would be, is this the design we are looking for? Obviously the short-term fix is to just exempt the intel_mp code from the reliable-cleanup-via-supervision-tree mechanism. But then intel_mp isn't taking advantage of the reliable cleanup code -- a kill -9 or a segfault in the data plane would not reset the card in any way.
I propose that we adopt the core.worker multiprocess model for spawning cooperating Snabb processes, provided that this will suit @petebristow. This framework did not exist when @petebristow was developing the intel_mp driver and his application, and so it is natural that he didn't use it, but now that it works (and is ready to land on master via #1133) I am hoping that it will suit him too.
I also think the "master process spawns tree of workers" model will be able to accommodate more NICs in a consistent way going forwards. Some cards allow the simple peer-to-peer process structure, like 82598 and I350, while others really want to be at least partly configured from a kernel-like central point, like ConnectX and FM10K. If we have a non-realtime parent process that is able to implement the shared setup functionality then we should have all of the bases covered.
Pete, whaddayareckon?
I have belatedly pushed the API documentation for the multiprocess-engine branch ( #1021) here: https://github.com/snabbco/snabb/blob/82f335573a743fd608baa5ee966d3c5ef35183ab/src/README.md#multiprocess-operation-coreworker.
@petebristow How does that API look to you?
Have been having a look at how to integrate intel_mp with the core.worker module. I hope it would be straightforward to replace the shell code for running Snabb processes with Lua code in a "main" process to start them with core.worker.
Here's a quick example of how one of the test cases might look:
function test_10g_1q_blast ()
local header = [[ m = require("apps.intel_mp.selftest") ]]
worker.start("testsend", header..[[ m.testsend("SNABB_PCI_INTEL1") ]])
worker.start("testrecv", header..[[ m.testrecv("SNABB_PCI_INTEL0") ]])
sleep(1)
local RXDGPC = counter.open("group/intel_mp_test/testrecv.RXDGPC")
assert(counter.read(RXDGPC) > 10000)
end
which is translated from the shell code:
#!/usr/bin/env bash
SNABB_SEND_BLAST=true ./testsend.snabb $SNABB_PCI_INTEL1 0 source.pcap &
BLAST=$!
SNABB_RECV_SPINUP=2 SNABB_RECV_DURATION=5 ./testrecv.snabb $SNABB_PCI_INTEL0 0 > results.0
kill -9 $BLAST
test `cat results.0 | grep "^RXDGPC" | awk '{print $2}'` -gt 10000
exit $?
Could be a reasonable approach? and if this would work for translating the intel_mp test cases then maybe it would also work for @petebristow's real applications?
There is one trick here for having one Snabb process read a counter value from another. I supposed that a simple solution would be for the worker to create an shm alias (symlink) from its RXDGPC counter into the shared group/ shm directory. That way any process in the group can access the counter using a well-defined name (I prefixed it with the name of the worker process.) Just an idea.
@wingo I wonder actually whether the YANG support would be suitable even for such simple cases as test suite programs? The intel_mp tests run a few related programs and need to provide some basic configuration data (PCI address, delay before sending traffic, whether to loop traffic) and need to extract some operational data (NIC counters.)
It would be quite neat if we are ready to depart from the world of passing configuration data via environment variables and extracting operational data using awk!
Judging by the lib.yang doc on the lwaftr branch (#1133) the mechanism for exporting operational data from Snabb counters to YANG models is not there yet? (So maybe no good way yet for one process to read the RXDGPC counter from another?)
Good morning, early riser @lukego :) You could certainly describe configuration with a custom YANG schema. That would automatically define a textual syntax for expressing configurations, and the yang code can easily produce "normal" Lua objects from that that the test could use directly.
As far as counters go, I think that document is a little out of date. We have a working thing that's not very organized; really a prototype. What it does is look for any state data in a YANG schema, collecting all non-"config" leaves in the schema. Counters that are under the apps/ subtree that have the same name as these state leaves will be exported when building a state tree, e.g. via https://github.com/Igalia/snabb/blob/lwaftr-2017-04-24/src/lib/yang/state.lua#L93. This tree of Lua values can be serialized to a string using the API or using snabb config get-state, if the program has "snabb config" support.
Looking at it now I see that it can be better in many ways :)
@wingo Neat :)
So would it be reasonable for one Snabb process to provide configuration to another by constructing the value as a Lua object and serializing that to a file with yang.print_data_for_schema()?
So if one Snabb process wanted to provide configuration to another, might it construct the configuration as a Lua object and then call yang.print_data_for_schema_by_name() to write that to a configuration file? That would be instead of putting the values into environment variables or directly into the Lua code passed as a string to worker.start().
Yes certainly. You'd need a schema of course but maybe we're heading that way :)
For more high-bandwidth communications you can serialize to the binary format instead, but for this use case the text format sounds fine.