snabb
snabb copied to clipboard
Create new "snabb" module to define our API
I think it is important to create a unified API accessible via the module snabb.
This would accomplish a great many things:
- Define a supported interface with semantic versioning and consistent user documentation.
- Define - by exclusion - the unsupported interface that represents engine internals.
- Decouple the naming of modules and apps from the source code layout e.g.
snabb.app.intel10gvsapp.intel.intel_apporsnabb.ctableinstead oflib.ctable.ctable. - Avoid
require()by having everything in a ready-available namespace (snabb.engine,snabb.packet,snabb.ffi,snabb.bit). - Avoid the need to cache modules/functions with
local bit = require("bit") local lshift = bit.lshiftby representing thesnabbtable with FFI metatables (eagerly bound) rather than ordinary Lua tables (late bound). - Could even be possible to use
setfenv()to avoid having to type thesnabb.prefix...?
I tried this once before in #465 but didn't manage to fully develop the idea at that time. Since then some things seem clearly, for example it seems that we do trust allocation sinking enough to continue returning FFI pointers from core functions like packet.receive(), while other things have become murkier, for example this new idea that we could keep the syntax module.function(x, ..) instead of x:method(...) and still get the performance benefit of FFI metatable early binding simply by representing the module as an FFI object.
Following up #733 and cc @wingo @eugeneia
Is there anybody who wants to take the lead on this? I can contribute strong opinions...
If no one has stepped up by March, I'm interested. I'd really like to see this.
@kbara it would be awesome if you take the lead on this!
I do not like being negative but I am not so sure that this is a good idea :) If we move the implementation of our code to this module layout, great. However anything that adds an abstraction between what code I'm using and where it's implemented is less great. When I see that a module does require('lib.foo') I know where to go and look for that code; if it's some synthetic API, not so much. How we build that API could mitigate that, but I wonder if the extra abstraction would not be a mistake.
@wingo The main value I see in this activity is to ask ourselves, "How should Snabb application code really look?"
Consider a really simple example like example_spray.lua:
module(..., package.seeall)
local pcap = require("apps.pcap.pcap")
local sprayer = require("program.example_spray.sprayer")
function run (parameters)
if not (#parameters == 2) then
print("Usage: example_spray <input> <output>")
main.exit(1)
end
local input = parameters[1]
local output = parameters[2]
local c = config.new()
config.app(c, "capture", pcap.PcapReader, input)
config.app(c, "spray_app", sprayer.Sprayer)
config.app(c, "output_file", pcap.PcapWriter, output)
config.link(c, "capture.output -> spray_app.input")
config.link(c, "spray_app.output -> output_file.input")
engine.configure(c)
engine.main({duration=1, report = {showlinks=true}})
end
There are a bunch of aspects to this module that we may (or may not) want to reconsider:
- Using
module(). Better way? - Using
require(). Useful or senseless boilerplate? - Cache modules with
local. Important or not? (Less verbose alternative available?) - Calling functions (
config.app(c, ...)) rather than methods (c:app(...)orc:config_app(...)). Is this the way that we prefer? (See also #740).
I think it would be a valuable exercise to review these kind of issues periodically and see which ones we have agreement on, which ones are still contentious, and to capture this in a well-versioned API definition that provides some basic levels of forwards/backwards compatibility (semver).
I frame this as creating a snabb module but it needn't be so. However, let me just sketch for the sake of illustration another way that the example above could be written:
if not (#snabb.parameters == 2) then
print("Usage: example_spray <input> <output>")
main.exit(1)
end
local input = snabb.parameters[1]
local output = snabb.parameters[2]
local c = snabb.config()
c:app("capture", snabb.app.pcap_reader, {path = input})
c:app("spray_app", snabb.app.sprayer)
c:app("output_file", snabb.app.pcap_writer, {path = output})
c:link("capture.output -> spray_app.input")
c:link("spray_app.output -> output_file.input")
snabb.engine:configure(c)
snabb.engine:main({duration=1})
... and I am not arguing that the second example is better, only that the first example showing current practice represents a lot of choices that were made in isolation and never systematically revised.
EDIT: Shortened second example for better effect.
EDIT2: Adopted snabb.app.foo syntax for referring to app classes.
Torch is another LuaJIT application similar to Snabb Switch. Their API documentation is worth checking out, for example the Tensor object. IIRC the torch module was the inspiration for having a snabb module.
Quite nice that they also think about, and document, the printed representations of their core objects and make everything REPL friendly.
That looks like a pretty good example of some of the things I miss most working with Snabb. It'd be great to move more in that direction, especially as core lib.protocol APIs and similar stabilize.
@lukego You realize that Torch7 was originally (co-)developed just a few hills away from you at IDIAP in Martigny? I would have suggested that you go visit those folks to chat about their API design, except they all seem to work for the FaceTwitGoog now...
@sleinen Yeah :).
I did ask their advice (via mailing list) before adopting the "busybox" structure of a unified executable that includes everything. They went the opposite direction and decomposed their software into libraries that are installed separately by the "luarocks" framework perhaps via the OS distribution. I don't think that approach would have worked for us though, too much putting our fate into the hands of others.
Imho we have found a reasonably good middle ground between independent libraries and framework-like structure. I feel like core.* should be the subject of the API covered in this issue, while lib.* and apps.* should remain libraries. My reasoning is that it would be worthwhile to have multiple libraries or apps (e.g. lib.foo and lib.bar) that solve the same problem in different ways. @capr recently explained to me the virtues of libraries vs frameworks, and my takeaway was that with a set of libraries the “user” can choose what subset he uses freely because libraries are designed to be composable while an opinionated framework may paint the user into a corner because its designed to be used as a whole, e.g. “all or nothing”.
The above may be a bit tangential, but I see an approach that minimizes any risks that come with designing snabb.*: if we start with core.* and add layers to the onion step-by-step we can review at each step and think about if we liked the onion without the next layer better than the onion with the next layer. Makes sense?
@eugeneia On the one hand I agree that the overall structure may be fine and having "more than one way to do it" for libraries is healthy. Even so I would like to consider coming up with a neater way to refer to the libraries and apps that we already have.
For example I would much prefer to write this:
snabb.app.vhost_user
than this:
local vhost_user = require("apps.vhost.vhost_user")
vhost_user.VhostUser
where in the second example I have typed the word "vhost" five times in order to refer to a common app.
Likewise if all other things were equal I would much prefer to write this:
snabb.link.transmit(...)
than this:
local link = require("core.link")
local link_transmit = link.transmit
link_transmit(...)
where again I typed the word "link" five times in order to make one function call.
This kind of thing seems to me worth fixing and also best resolved with a "How do we really want the code to look?" survey.
(Side-benefit: shorter and less redundant source would make the REPL much more convenient to me. Even just typing ffi = require("ffi") every time I run snsh -i makes me crazy.)
Further thought:
The OO convention documented on #740 could also help to relieve verbosity.
Procedural code accessing a snabb module might end up more verbose even without the require() calls:
local link = snabb.link.new()
snabb.link.transmit(link, p)
snabb.link.receive(link)
and this might be better with object-oriented code:
local link = snabb.link()
link:transmit(p)
link:receive()
... but these are surely not the only options available.
My 2c on initial points:
point 1. re versioning the API: something like love2d? that's a good example of versioned documentation IMHO.
points 3. 4. 5. re decoupling source file paths from namespaces: I would actually do the reverse: put an __index on the snabb table that loads submodules automatically so that accessing the unknown key snabb.link triggers snabb.link = require'snabb.link'. -- and yes, I know what you're thinking 1) that's magic -- yes, but you don't have to update a global table everytime you add a submodule; 2) that's slow -- but in my experience LuaJIT always hoists out these things from loops so I never really understood the need for local bor = bit.bor with LuaJIT. Am I wrong about this? Did anyone ever manage to speed up a loop by caching loop-constant things in locals?
point 6. I used this pattern a lot, works great.
Did anyone ever manage to speed up a loop by caching loop-constant things in locals?
Yep, quite often. The reason: loop-constant things are indeed hoisted to the loop pre-header, so in the main loop trace you don't need to reload them. But if you have a side trace, it rejoins the loop at the pre-header, and so has to evaluate the invariants again. Manual hoisting avoids a penalty for side traces.
An extreme example of what Andy mentions is lukego/blog#8.
@wingo @lukego thanks, I'll look into this. I wish I'd have a mental model of what side traces are and when they happen.
@mraleph hinted that there may be a simple way to avoid the overhead of side-traces rejoining the parent trace. I posted a bounty on LuaJIT via BountySource in case anybody is inspired to make that work. See https://github.com/LuaJIT/LuaJIT/issues/114.
I would not say the idea (replying pre-header into the side-trace) is completely simple - it comes with a lot of tiny things to figure out (one of the most important things is how to avoid increasing the size of generated code to much).
what a can of worms I opened :) I understand the need for the metatype early-binding hack now, although TBH I would still cache on an as-needed basis, close-to-the-loop and documenting why I'm doing it in each case.
...and now for the comment that I don't expect anyone to take seriously (i.e. what I'd really like to happen):
- break snabb into separate libs, each lib at
github.com/snabb/<lib>- if not, at least break programs away from it
- remove all hierarchy: put all modues flat in
snabb/<module>.lua - let snabb work as a lib with user-provided luajit 2.1, and make it clonable in user's LUA_PATH
- document each module in
snabb/<module>.md- better yet, move it all to github wiki
I know it sounds crazy, but I speak in good faith :) I have a 500-modules-in-100-repos project that works like this and I wouldn't go back to a single-repo setup or hierarchies.
@capr Short answer: that ship has already sailed :).
Long answer:
I thought seriously about this approach a year ago, see snabb-devel post. I understand that it makes sense for other projects. I vastly prefer the approach that we have now though where Snabb is an atomic application that tracks all of its dependencies very carefully. This eliminates large classes of errors and support problems.
Sometimes Snabb requires a really recent version of LuaJIT e.g. a specific fix that was made only days ago (example). It is very powerful to be able to resolve those dependencies once-and-for-all in the main repo rather than dealing with support requests due to changes made downstream ("I ran Snabb with LuaJIT 2.0 and <...completely predictable problem that should never have happened...>").
Still: we could spin off parts of the code into separate libraries and track them as submodules, the same way that we use ljsyscall, pflua, etc. Certainly I think it makes sense that the Igalians wrote pflua as its own library rather than exclusively a part of Snabb.
There have also been some related ideas like compiling a libsnabb.so for use in C++ applications (#606) but this has not taken off and is a lot of work to do right (e.g. audit all code for thread-safety/reentrancy even though this is not relevant to the Snabb application itself.)
My own thoughts are moving in the opposite direction. I want to add musl-libc as a submodule and use fully static linking to break our dependency on Glibc.
@capr well that is my off-the-cuff response anyway. Since this PR is all about looking at things with fresh eyes we can see if other people see things differently too.
what a can of worms I opened :) I understand the need for the metatype early-binding hack now, although TBH I would still cache on an as-needed basis, close-to-the-loop and documenting why I'm doing it in each case.
This is very hard to predict though. In practice we will either do it too often or too seldom. Code reviews will haggle about "is it really worth caching this?" or "are you sure you shouldn't cache that?". I would be much happer if we could find a simple and consistent solution that doesn't require too many tiny choices. (That even seems to bother Mike Pall).
Javier once went through the code and replaced a bunch of tables (e.g. packet, link, etc) with FFI metatables (early bound). He observed ~5% speedup on some non-trivial benchmarks. This suggests to me that we are paying a significant price of unused late-binding today (see snabb-devel post).
Alternative idea: we could put all of our modules directly into the global namespace instead of a snabb module.
Here is how the example above might look in that case:
if not (#args == 2) then
print("Usage: example_spray <input> <output>")
main.exit(1)
end
local input, output = unpack(args)
local c = config()
c:app("capture", app.pcap_reader, {path = input})
c:app("spray_app", app.sprayer)
c:app("output_file", app.pcap_writer, {path = output})
c:link("capture.output -> spray_app.input")
c:link("spray_app.output -> output_file.input")
engine:configure(c)
engine:main({duration=1})
This would be in the style of Lua's own native model where the basic APIs are globally available (string, table, math, etc) and don't need to be required. The extra brevity could also make us more REPL-friendly.
(Note: The core modules are already in the global namespace today, e.g. packet, but people don't use them that way for fear of expensive lookups when referencing the module on every call.)
I found this discussion while looking for a way to improve code readability.
I share @wingo's concerns about decoupling the API from the code's actual structure: that would impair code maintainability. A cleaner, if more arduous, way would be to actually restructure the package layout.
I also share @eugeneia's point that there's value in keeping libraries independent from the core, and able to be used and evolve independently from it.
Momentarily disregarding the public API and package layout issue, I'd like to first concentrate on namespacing and module interdependence, for which I created a new discussion: #1132.