rerun icon indicating copy to clipboard operation
rerun copied to clipboard

Document e2e logging performance for time series data

Open nikolausWest opened this issue 1 year ago • 2 comments

We want to benchmark logging scalars, including setting a timeline value for each logged scalar, i.e. something like

for frame_nr in range(0, 1_000_000) {
    rr. set_time_sequence("frame", frame_nr)
    rr.log("scalar", rr.TimeSeriesScalar(sin(frame_nr / 1000.0)))
}

We have the tool for it:

just rs-plot-dashboard --num-plots 10 --num-series-per-plot 5 --num-points-per-series 5000 --freq 1000

For each language (C++, Python, Rust), measure the max throughputs (scalars per second), end-to-end (logging -> visualization) for single-threaded/single-plot and multi-threaded logging (so 3 x 2 throughput figures).

We also want to check the memory use in the viewer when we have logged 100M scalars or so, to measure the RAM overhead.


manually document this somewhere in our docs, i.e.:

On a 2023 MacBook M1:

Language Single-threaded Multi-threaded
C++ ? kHz ? kHz
Python ? kHz ? kHz
Rust ? kHz ? kHz

Viewing 100M scalars use up ?GB of RAM in the native viewer.

Very rough numbers is fine, e.g. "~10 M scalars / second"

nikolausWest avatar Jan 23 '24 12:01 nikolausWest

We should link to https://github.com/rerun-io/rerun/issues/4423 too

emilk avatar Jan 29 '24 10:01 emilk

I know there was some decision to punt on this (and it was moved to Triage), so I'm moving this down in urgency.

It would be nice with a short comment explaining why we are punting on this though.

emilk avatar Feb 06 '24 20:02 emilk