Genericize tracing interface
Ideally, Zig internally should support multiple plugable tracing frameworks.
The current Zig tracing interface is exclusively targeted to https://github.com/wolfpld/tracy. There are a few other competing tracing frameworks, primarily https://github.com/opentracing, which provide similar functionality with slightly different semantics.
Isn't opentracing more focused around tracing remote procedure calls rather than in-process?
current Zig tracing interface is exclusively targeted to https://github.com/wolfpld/tracy.
To be clear, the current self-hosted compiler tracing interface is. There is no tracing interface in the standard library (yet).
Anyway, that said, I agree, this is one of those things that would be nice for the standard library to standardize on. I think we could do it similarly to how std.log works. Idea being, you could leave the trace calls in your code, and they will end up as no-ops unless activated by overriding the default tracing function.
Should we also do this for emulation/virtualisation? Special-casing Wine and QEMU seems like a suboptimal solution.
OpenTracing is now https://opentelemetry.io/
Spec: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/overview.md Go Client: https://pkg.go.dev/go.opentelemetry.io/otel/[email protected] Example: https://github.com/open-telemetry/opentelemetry-go/blob/master/example/basic/main.go
Hi, I created trace.zig which may be helpful to implement tracing in the standard library: https://gitlab.com/zig_tracing/trace.zig
Whether some of the ideas can be used as inspiration or avoiding the mistakes I did.
I've been writing a zig module for the opentelemetry tracing specification. Some thoughts:
- Tracy is focused on instrumenting a single binary while doing performance optimization work.
- OpenTelemetry Traces are focused on tracking a user request as it goes from client to server and server to other servers. You are expected to leave it enabled in production to monitor the health of a service.
- There are two potential places to implement a Tracy backend, as the OpenTelemetry separates the API specification from the SDK specification. One would be a Tracy implementation of the API, and the other a Tracy SpanExporter for the SDK.
Questions:
- Should the standard library API aim for OpenTelemetry spec compliance?
- Should the standard library provide an SDK for that API? E.G. should it provide a implementation that sends data to an OpenTelemetry collector and/or an implementation that send data to Tracy?
- Should the standard library have APIs for the other "signals" that OpenTelemetry defines (metrics, baggage, the WIP profiling)?
- If we make OpenTelemetry compatible APIs, should they live under
std.otel?
For now I will continue work on making an API targeting spec compliance, and I plan on submitting a pull request at some point.
A comparison of the tracing API vs std.log:
- The OpenTelemetry
Tracertype is similar tostd.log.scoped, giving developers a way to separate where aSpanoriginates from.std.log.scopeduses an enum literal for the name, while aTracerrequires anameand allows the additional parametersversion,schema_url, andattributes. - OpenTelemetry is object oriented, and has you get the default
TracerProvider, use that to create aTracer, then use that to create aSpan. I think zig would want something more likeconst tracer = std.trace.scoped(.std),tracer.createSpan().
A comparison of the OpenTelemetry tracing API vs zig/src/tracy.zig:
zig/src/tracy.zigtracerequires a source location, creating an OpenTelemetrySpanrequires aname, and only supports source locations with an "experimental" attribute group- Presumably Tracy's zone color could be mapped from an attribute like
tracy.zone.color.
I reviewed a couple profiler/tracing APIs.
- tracy profiler: C++ and C performance profiling library
- palanteer: C++ and Python performance profiling library
- optick: C++ performance profiling library
- OpenTelemetry: Cloud Native Computing Foundation (CNCF) specification
- zig-tracer: nektro's zig library for tracing
- Linux Kernel
tracepoints: C Macros for attaching probe functions
Here is a proposal for a tracing API for the standard library:
//! std.trace, spans
const trace = @This();
const root = @import("root");
/// An opaque type that will be passed to `std.trace.end` to indicate that a zone has ended.
///
/// - The default no-op backend returns an empty struct type
/// - Tracy returns a struct containing a u32 id and a c_int active flag
/// - a (theoretic) `palanteer` backend would return src.fn_name
pub const SpanToken = if (@hasDecl(root, "SpanTokenType")) root.SpanTokenType else DefaultSpanToken;
pub const BeginFn = fn(comptime scope: @Type(.enum_literal), comptime src: SourceLocation, args: anytype) SpanToken;
pub const EndFn = fn(SpanToken) void;
pub const begin = if (@hasDecl(root, "trace_begin")) root.trace_begin else defaultBegin;
pub const end = if (@hasDecl(root, "trace_end")) root.trace_end else defaultEnd;
pub fn scoped(comptime scope: @Type(.enum_literal)) type {
return struct {
pub fn begin(comptime src: SourceLocation, args: anytype) SpanToken {
return trace.begin(scope, src, args);
}
};
}
// Default no-op implementations
pub const DefaultSpanToken = struct {};
pub fn defaultBegin(comptime scope: @Type(.enum_literal), comptime src: SourceLocation, args: anytype) SpanToken {
_ = scope;
_ = src;
_ = args;
return .{};
}
pub fn defaultEnd(ctx: SpanToken) void {
_ = ctx;
}
Note that root.SpanTokenType, root.trace_begin, root.trace_end can't all be merged into std.Options or else we get a cyclic reference error.
General usage would look like this:
const std = @import("std");
const tracer = std.trace.scoped(.a_scope_that_is_hopefully_very_unique);
pub fn main() !void {
const span = tracer.begin(@src(), .{});
defer std.trace.end(span);
childFunction();
}
pub fn childFunction() void {
const span = tracer.begin(@src(), .{
.tracy_color = 0x0000FF, // make it a ghastly blue, tracy
});
defer std.trace.end(span);
}
This would cover most of the usecases out there, especially in libraries that may not want to add a dependency on a specific profiler.
What this doesn't cover:
- Allocation tracing
- Frame marking
- Attaching additional metadata after the span has started
- Checking if a span is recording
- Virtual Threads
- Lock tracing
Some of the stuff mentioned above would be useful to have, but perhaps as a separate proposal.
What I'm thinking of doing after:
- Investigating an abstraction similar to the Linux kernel's
tracepoint - Instrumenting the
GeneralPurposeAllocatorby defining an API for Allocations- Or perhaps a more general aggregating metrics API a la OpenTelemetry
I will probably start working on a pull request, but feedback is welcome.
Spent the weekend learning about SystemTap static probes. I think they're pretty neat! I'd heard of DTrace before, but digging into the details has been kind of mind blowing to me. You add a nop instruction your code, and then add a ELF note that looks like this:
stapsdt 0x0000002e NT_STAPSDT (SystemTap probe descriptors)
Provider: main
Name: loop
Location: 0x00000000010019e9, Base: 0x0000000001000238, Semaphore: 0x0000000000000000
Arguments: 8@112(%rsp)
And boom! Now you can break on a specific probe, even in a stripped ELF binary:
(No debugging symbols found in ./zig-out/bin/infinite-loop-example)
(gdb) info probe
Type Provider Name Where Semaphore Object
stap do_thing BEGIN 0x0000000001001a55 0x0000000001004042 /home/geemili/src/1_projects/probes/zig-out/bin/infinite-loop-example
stap do_thing END 0x0000000001001bdb /home/geemili/src/1_projects/probes/zig-out/bin/infinite-loop-example
stap main BEGIN 0x00000000010019b8 0x0000000001004040 /home/geemili/src/1_projects/probes/zig-out/bin/infinite-loop-example
stap main loop 0x00000000010019e9 /home/geemili/src/1_projects/probes/zig-out/bin/infinite-loop-example
(gdb) break -probe main:BEGIN
Breakpoint 1 at 0x10019b8
(gdb) run
Starting program: /home/geemili/src/1_projects/probes/zig-out/bin/infinite-loop-example
Downloading separate debug info for system-supplied DSO at 0x7ffff7ffd000
Breakpoint 1, 0x00000000010019b8 in ?? ()
(gdb)
It also allows passing in up to 12 integer/pointer/float arguments, which I use here to pass a unique id and the fields of std.builtin.SourceLocation:
(gdb) printf "%x %s %s %d %d\n",$_probe_arg0,$_probe_arg1,$_probe_arg2,$_probe_arg3,$_probe_arg4
2a0ead infinite-loop.zig main 2 37
And it's not just gdb that supports them. perf, bcc/bpftrace, and lttng all support USDT probes.
One downside is that it's Linux specific. As I understand it MacOS and the BSDs have DTrace, which SystemTap is based on. Windows has "Event Tracing for Windows".
I ported the C preprocessor macros and assembly (and assembly macros) from SystemTap into Zig. I'm wondering if the overhead of USDT probes is small enough to keep them in the Zig compiler by default, and if the Tracy build option could be replaced with a script that places Tracy instrumentation on the appropriate probes. Most of the tools (except for LTTng) do tracing from the kernel with eBPF, which may add some overhead compared to keeping it all in userland.
Anyway thanks for reading this infodump about probe points
- Or perhaps a more general aggregating metrics API a la OpenTelemetry
@leroycep FWIW, I am trying to build an OpenTelemetry compliant SDK for Zig here, I'd love to hear your thoughts on it 🙏🏼 (starting with Metrics, the most complex one).
If you are trying to get Profiling going, I think you might be interested by the OTel proto code generation I did using protobuf-zig in the repo, can be adapted to create generated code for the OTel Profiling protocol.
Be mindful that currently the Profiling OTel signal is experimental, so the proto files are subject to breaking changes at any time. I know because I was part of the OTel Profiling SIG until last year and I still work with the team building the agent now named opentelemetry-ebpf-profiler.
I've done some more research on DTrace probes and how they differ from SystemTap probes. Most important resource here is oxidecomputer/usdt, specifically these two issue:
An overview of the two formats:
- DTrace probes can have two instructions marked for modification: a
nopand axor rax, rax.- The
xor rax, raxinstruction let's the program know if the probe is enabled. This allows probes to expose data that may be expensive to compute, with minimal impact to unprobed performance. - The
nopis the probe proper. It is treated as a potential function call, and in fact when MacOS added support for DTrace they made the linker turn calls to a symbols like__dtrace_probe$foo$bar$v1intonops. Then when a probe is attached thenopis rewritten to breakpoint instruction (INT3or0xCCon x86). - In addition to describing the location of the probe instructions, DTrace probe notes also describe the parameters as a list of C types in null terminated strings.
- The
- SystemTap probes have a
nopand an optional*volatile u16semaphore:- Programs can use the
*volatile u16semaphore to check if the probe is enabled, but its seems more intended for programs attaching to probes to check if other programs are attached. - The
nopis again the probe proper. However, in this case, thenopis not treated like a potential function call. Instead, each probe generates a space separated list of registers accesses to locate the arguments.
- Programs can use the
- DTrace registers a list of probes at startup. I'm not sure if SystemTap does, and I'm also not sure if DTrace has to do it on all systems.
Pros and cons of each format:
- SystemTap's semaphores are unsuitable if you want to do additional work before a probe in a hot loop, as reading a mutable location in global memory may cause a cache miss.
- SystemTap's argument format is a mixed bag:
- Using assembly/register accesses should theoretically allow disabled SystemTap probes to have less of a performance impact than DTrace's pseudo function call.
- The location description is dependent on what a compiler emits for inline assembly. This can cause difficulty on both the generation and parsing side.
- There is no type information beyond byte size, signedness, and
is_float
- DTrace's
xor rax, raxinstruction to guard probes makes a lot more sense in a hot loop, as it keeps all the necessary information in the instruction stream. - Attaching type information makes sense; using C types is more standard but also kind of meh. Generating the correct C type information is a bit limiting and makes more work for implementing the probes.
I'm wondering if the overhead of USDT probes is small enough to keep them in the Zig compiler by default, and if the Tracy build option could be replaced with a script that places Tracy instrumentation on the appropriate probes.
I did a bit more research on this:
- DTrace and SystemTap currently use 1 byte nop instructions for the probe. This is not enough space for Tracy instrumentation without using an extra debugger process or complicated instruction patching.
- However, x86_64 does have some multi-byte nop instructions that it recommends. A 5-byte
nopcould be replaced with a 32bitcallorjmpinstruction. - The difficulty then becomes generating the correct code for a probe and linking it into the binary.
I think the next steps are as follows:
- Measure the time to compile using a normal release build of Zig.
- Make
src/tracy.zigsupport SystemTap style probes, except with 5-byte widenops - Measure the time to compile using a release build of Zig with that "tracy" backend enabled
- Report back with the results
Note that this still doesn't answer the question of what the generic tracing interface should be. I think DTrace-like probes are a rather elegant solution, but I think it is more "a" solution that should be supported by generic tracing interface and less "the" generic tracing solution.
* [#340 feat: Support Linux stap v3 probes](https://github.com/oxidecomputer/usdt/pull/340)
Heya, just a heads up: this PR did get merged but included a couple of bad mistakes of mine that you ought to be aware of if you follow in its footsteps.
- At least based on the SystemTap USDTs present in eg. libc, the
wNregister names are never used on x86_64 and instead it uses thexNnames regardless of data size: fix. - Basically all linkers except GCC's ld (so lld, mold, and even gold) will remove the SystemTap USDT's
.stapsdt.baseELF section if garbage collection is used. This discovery led even SystemTap itself to add the "R" or "Retain" flag to this section to ensure it isn't GC'd: fix. - I managed to somehow delete an
.align 2command from the "semaphore" ELF section, leading to the section potentially being unaligned and causing issues on ARM processors: fix.
Aside from those, I can highly recommend USDTs as the basis of your tracing interface. They're pretty cool :)
Just as a side note, on the Linux SystemTap side of things compiler support for USDTs could make them more powerful / cheaper than they are in eg. the Rust version. The SystemTap USDT consists of an (optional) semaphore and a nop instruction. The semaphore is a 16-bit unsigned integer; it is read in the instruction stream and compared to zero in absolutely normal ways, nothing odd about it, and if the semaphore value is zero then it means no one is listening to the USDT and all of the work for it should be skipped. The USDT itself is the nop instruction which gets turned into an interrupt but importantly in the SystemTap world, the ELF header explains not only what parameters there are but also where they are. So the USDT can contain arguments like "8 byte signed integer read at an offset 0x20 from the stack pointer", and I believe it's even possible to get values like "8 byte unsigned integer read through a pointer at an offset 0x8 which is read through a pointer at an offset 0x30 from the base pointer", ie. chain pointer offsets and loads.
With compiler support, this would mean that a USDT could simply take a "location" like a.b.c and the compiler would convert those accesses to assembly instructions in the USDT ELF header for loading the actual c value, using registers and offsets to define it. If all the arguments can be defined this way (ie. there is no extra work like allocation that needs to be done to create the USDT arguments) then the semaphore is entire unnecessary and could be dropped by the compiler.
On DTrace systems this doesn't work, as they use the normal function call ABI for parameter passing, meaning that there is ~always some work to be done before the USDT probe triggers. But on those systems there is no semaphore value and instead it's handled by another "is_enabled" probe which takes no arguments and is thus as close to zero-effect as is possible.
Cheers