ringbuf: add zero-copy consumer APIs
The current ringbuf consumer APIs Read/ReadInto always copy samples into a user-provided buffer, see https://github.com/cilium/ebpf/blob/v0.20.0/ringbuf/ring.go#L96. For callers that consume data synchronizely, this extra copy is unnecessary.
This change adds a zero-copy consumer API:
- Peek / PeekInto – return a view into the mmap’d ring buffer without advancing the consumer position.
- Consume – advance the consumer position once the view has been processed.
Existing Read/ReadInto semantics are unchanged and continue to work as before.
A preliminary microbenchmark [1] shows zero-copy advantage grows with larger records because copy throughput falls while view throughput stays roughly flat.
Single-core run on CPU 2, ring size 1 GiB, Go 1.24.4. Throughput is in million events/second (Mev/s); "speedup" is zero-copy / copy.
| event-size (B) | events/run | copy (Mev/s) | zero-copy (Mev/s) | speedup |
|---|---|---|---|---|
| 128 | 7,895,160 | 45.63 | 49.83 | 1.09x |
| 512 | 2,064,888 | 25.03 | 34.94 | 1.40x |
| 1024 | 1,040,447 | 8.90 | 34.94 | 3.93x |
| 2048 | 522,247 | 4.57 | 29.56 | 6.47x |
[1] https://github.com/jschwinger233/bpf_ringbuf_zc_benchmark
Thank you for the inputs :pray:
- API: Zero-copy APIs are intentionally single-consumer and mirror existing Read/ReadInto semantics. I’m open to reshaping this (separate interfaces, callback-based API, etc.) if you prefer another direction.
- Performance: Benchmarks in the PR description are updated; for 1024-byte records the zero-copy path is up to ~4× faster.
- CI: a re-run is now green.
From my side this is ready for review. Please feel free to request changes if any part of the design or implementation doesn’t make sense.
I agree with Timo and Florian: Reader needs to be concurrency safe. As Timo mentioned, doing a callback based design is one option. Another one would be an iterator. This has the benefit of being a bit more ergonomic for callers because its easier to break out of a for loop than a closure.
// Records iterates over records in the reader until [Reader.Close] is called.
//
// Record.Sample is only valid until the next call to the iterator.
func (*Reader) Records() iter.Seq2[*Record, error]
You might have to adjust the method to return Record instead to avoid allocations, depends a bit on how smart the Go compiler is. Internally the iterator would lock the ring, construct a Record from the ring contents (be careful to re-slice the capacity of Sample to prevent OOB appends to the buffer) and yield the record. Once the yield returns it would update the ring consumer offset and do another round. The tricky part will be figuring out when to take / drop locks here.
Also note that this means that View is not necessary anymore. The consumer offset state is stored in the iterator instead.
I re-implemented Records iterator API as Lorenz suggests 🙏