Spans and flame graphs
Currently all our data is associated with a single instance in time - they are events.
There is however many things that require data to span a time range, such as audio and video.
Another useful thing to use spans for is for flame graphs, which is a way to visualize a call graph:
Such a flame graph is useful for profiling, but also for observability, i.e. understanding how a piece is connected.
Implementation
And easy way to implement this is to use a special enum Span { Begin, End } component.
We then have a special time range query that is aware of spans (i.e. querying for data in the time range [#10, #20] must include time ranges that spans [#0, #30]).
Together with a special Flame Graph Space View we have a pretty good start.
Threads and processes
For multi-threaded or multi-processed data we must have one flame-graph per thread:
We should also be able to record relationships between threads. For instance, we want to be able to see that thread A is blocked waiting on thread B and C (see also https://github.com/EmbarkStudios/puffin/issues/174).
This also means log events should come with a ProcessId and ThreadId component.
API
For Python, a with scope makes sense, as does a function decoration:
@rr.span
def my_function(images):
for image in images:
with rr.span(f"image {image.name}"):
process(image)
…with optional recording argument
In Rust and C++ we would need to use macros, similar to e.g. puffin and loguru.
See also
- https://github.com/rerun-io/rerun/issues/2852
- https://github.com/rerun-io/rerun/issues/2963
- https://github.com/rerun-io/rerun/issues/4622
It probably makes a lot of sense to both take inspiration and make sure we're ultimately compatible with OpenTelemetry. In this case worth looking at the tracing package: https://opentelemetry-python.readthedocs.io/en/latest/api/trace.html
Implementation
And easy way to implement this is to use a special
enum Span { Begin, End }component.We then have a special time range query that is aware of spans (i.e. querying for data in the time range
[#10, #20]must include time ranges that spans[#0, #30]).Together with a special Flame Graph Space View we have a pretty good start.
I'm not sure I understand why we need to introduce a new/special component for this?
Since the timepoint we have to day is effectively a start timepoint, an alternative implementation I had in mind was to introduce a second, optional timepoint for every log event, which specifies the end timepoint of the event (which therefore becomes a span rather than an event at this point). If the end timepoint isn't specified, then we only look at the start timepoint and consider the event to be instantaneous, as we do today. Otherwise it's a span.
This allows to have spans that cover different time units quite naturally (e.g. "this event spanned 278ms wall-clock time (log_time), 90 simulation ticks (sim_tick) and was instantaneous on the the frame timeline (frame_nr)").
Then I don't think we need to change anything query-wise? Haven't thought about it enough to be sure though.
That is another way of implementing it for sure, but it is quite useful to be able to distinguish an event from a span, and it is also useful to be able to express half-open spans (spans with just a start or just an end).
I envision a flame-graph like view where log events (e.g. text and images) are shown as single point inside the span that contains them.
Maybe we should separate these concepts as:
- Event: data + a time point
- Duration event: data + start and end time
- Span: an operation (unit of work) + a start and end time.
- A hierarchical set of spans make up a trace
- A trace can be visualized as a flame graph
- Multiple events can be produced within a span
- A hierarchical set of spans make up a trace
Some kind of flame graphs support would be great because it would allow you to create a Common Trace Format (CTF) data loader to analyze the program runtime behavior together with the generated output data.