[sketch] Timeline redux
TLDR; Scroll down for pretty screenshot.
I have a new revision of the timeline code. This one is intended to come upstream very soon. This branch is a preview for early feedback especially on the interface. I have rebased and split into logical commits. I marked this branch as [sketch] to reserve the right to do so again.
Timeline is a circular event log. Calling the logger is cheap (~5 cycles) and events are timestamped with sub-nanosecond precision. The log wraps around after one million events. Instrumenting Snabb with timeline logging makes it possible to do detailed offline profiling and performance diagnostics.
Here is how it works:
Define your events
First you define an "events file" like this excerpt from core/engine.events:
4|sleep_Hz: usec Hz
The engine requests that the kernel suspend this process for a period of
microseconds in order to reduce CPU utilization and achieve a fixed
frequency of breaths per second (Hz).
4|sleep_on_idle: usec
The engine requests that the kernel suspend this process for a period
of microseconds in order to reduce CPU utilization because idleness
has been detected (a breath in which no packets were processed.)
4|wakeup_from_sleep:
The engine resumes operation after sleeping voluntarily.
This example defines three events. They all have log level 4 (medium) which determines how frequently they will be logged to the ring. Each event can have up to six 64-bit integer arguments. The long textual descriptions will be stored only once in the log file (i.e. interned.)
Instrument your code
First you create a timeline log in shared memory and load the relevant event definitions:
timeline_log = timeline.new("engine/timeline")
events = timeline.load_events(timeline_log, "core.engine")
Then you instrument your code with calls to log events as they happen:
events.sleep_on_idle(sleep) -- Log message before sleeping
C.usleep(sleep)
events.wakeup_from_sleep() -- Log message after sleeping
These events are timestamped with sub-nanosecond precision using the RDTSCP instruction. Here the log will show both the number of microseconds you requested to sleep (value of the sleep variable) and also the number of cycles that you really did sleep (timestamp delta between log messages.)
Generate your data
Now when you run Snabb the file /var/run/snabb/$pid/engine/timeline will contain your events. You can copy this file either from a running process or one that has terminated for offline inspection. It is updated in real-time as file-backed shared memory.
The engine samples the events by randomizing the log level at the start of each breath. For 99.9% of breaths there is no logging at all. 0.1% are logged with a high level (e.g. start/end of breath), 0.01% with a medium level (e.g. individual push/pull calls), and 0.001% with a low level (e.g. logs within apps and libraries.) This is intended to produce roughly 1000 events per second and wrap the million-element log roughly every 15 minutes.
Calling the log function costs around 5 cycles on average. Most frequently logging will be disabled and the call will cost ~4 cycles. Occasionally logging will be enabled and the call will cost ~44 cycles. The logger is written in assembler and looks like a simple FFI call to the JIT.
Analyze the data
The log contains one million entries. That's too many to read by hand. The idea is to perform statistical analysis of the logs and be able to compare different sets of logs (e.g. test lab data vs production environment data.) Separate tooling will be available for this.
Here is an example analysis of the cycles per packet cost depending on which app is being called, how many packets are on its input links, and which NUMA node it is scheduled on. This data was generated from a directory containing 10 timeline files from Snabb processes with varied workloads and NUMA affinities (no special preparation needed.)

Graphs like this (and many others over time) will be easy to produce from a directory of timeline files.