otel-profiling-agent icon indicating copy to clipboard operation
otel-profiling-agent copied to clipboard

Benchmarking changes of the wire protocol

Open rockdaboot opened this issue 1 year ago • 3 comments

When making changes of the wire protocol, we should take into account the effect on CPU usage, memory usage and network bandwidth. For this we need some tooling for doing (nearly) reproducible benchmarks.

Roughly, my thoughts are

  • record data passed to the Reporter
  • replay previously recorded data (with the same order and timing!)
  • record the uncompressed on-wire messages (protobuf blobs)
  • a benchmark Go tool that does compression and decompression of the protobuf messages (Go because we want to measure the Go implementations of the compressors)
  • a python tool to generate diagrams / tables from the results of the Go tool

Profiling_-_Protocol_Benchmarking5

The recorded data can be replayed multiple times, e.g. with and without a protocol implementation change, to allow comparisons of the change's effects.

rockdaboot avatar Aug 12 '24 10:08 rockdaboot

When establishing and creating the OTel Profiling protocol, @petethepig invested noticeable time and effort in benchmarks - see https://github.com/petethepig/opentelemetry-collector/pull/1. He also documented changes and potential options with https://docs.google.com/spreadsheets/d/1Q-6MlegV8xLYdz5WD5iPxQU2tsfodX1-CDV1WeGzyQ0/edit?gid=1732807979#gid=1732807979.

It might be worth considering building on this existing work.

florianl avatar Aug 12 '24 11:08 florianl

A simpler approach that @christos68k and I have been testing with previously is to build two profiling agents with two protocols that you want to compare, then running them at the same time on the same machine while applying some heavy workload and recording the sum of all message sizes. Sampling won't interrupt exactly the same traces in both agents, but if you run it for an hour or so it should statistically give you a pretty good estimate. From previous experience of looking at differential flamegraphs of two agents running on the same machine, I'd expect the error to be in the realm of 0.5 - 1% with that approach. It's arguably more difficult to reproduce for other reviewers than with @petethepig's approach or the one that you are describing in this issue here.

athre0z avatar Aug 12 '24 18:08 athre0z

#120 is a PoC for the ideas outlines in the issue description.

rockdaboot avatar Aug 15 '24 16:08 rockdaboot