ros2_tracing icon indicating copy to clipboard operation
ros2_tracing copied to clipboard

Process trace events into intermediate storage format

Open mjcarroll opened this issue 2 years ago • 4 comments

As first discussed in https://github.com/safe-ros/ros2_profiling/pull/1

The idea would be to read raw CTF traces into some intermediate time-series data that is well suited for analysis tasks. Further high-level APIs could be built to ingest the intermediate data.

General requirements:

  • Disk and memory-efficient
  • Readable/writable across multiple languages
  • Convenient API for interaction with time series as well as basic tasks like JOIN

tracetools_read is currently using Pandas Dataframes

Proposed alternatives:

  • https://parquet.apache.org/

CC: @iluetkeb

mjcarroll avatar Feb 06 '23 14:02 mjcarroll

@mjcarroll I would just like to clarify a few things:

  1. tracetools_read (in this repository)
    1. Currently uses the babeltrace Python bindings to read a CTF trace from disk and return a list of events as Python dictionaries; it doesn't do anything else.
  2. tracetools_analysis (in ros-tracing/tracetools_analysis)
    1. Reads events from a CTF trace using tracetools_read and writes the dictionaries to a file (pickle). This is because it is quicker to read from a pickle file than to read the CTF trace using babeltrace, and this allows us to only read the actual CTF trace once and then just read the pickle file. See tracetools_analysis/process.py's process() function or the load_file() function, which is usually what's used in Jupyter notebooks, as you probably know.
    2. Processes events one by one and writes some data to pandas DataFrames. See tracetools_analysis/processor/ros2.py and tracetools_analysis/data_model/ros2.py, respectively. A single row in a DataFame roughly corresponds to a single trace event, but at this point the trace events are abstracted away.
      1. To improve performance, it actually first writes data to normal Python lists, and then converts these lists to DataFrames once all trace events have been processed. Appending to a Python list is much faster than appending to a DataFrame.
    3. Then some functions are written to compare/merge/etc. DataFrames to extract high-level information. See files under tracetools_analysis/utils/.

The idea would be to read raw CTF traces into some intermediate time-series data that is well suited for analysis tasks. Further high-level APIs could be built to ingest the intermediate data.

So I'm guessing you're talking about an alternative to step 2.ii, and not talking about storing the events themselves?

Then in parallel we can change steps 1.i/2.i, which is kind of more related to #22.

christophebedard avatar Feb 07 '23 22:02 christophebedard

So I'm guessing you're talking about an alternative to step 2.ii, and not talking about storing the events themselves?

Yes, mostly talking about an alternative to 2.ii in this outline.

Depending on the output of #22, there may be a potential of collapsing 2.i and 2.ii into a single step. For example, if ctf -> <intermediate format> is wildly more efficient than ctf-> pickled dict -> intermediate format while retaining all the same information. I don't see this as a high priority, though.

mjcarroll avatar Feb 07 '23 22:02 mjcarroll

Coming at this from the usage end, we have two different kinds of information in the CTF

  1. meta-data, essentially mapping names of functions and endpoints to thread-ids+memory-addresses
  2. activity data, that is, callbacks being called, messages being sent/received, etc.

Meta-data is emitted first, but due to things like the life-cycle, system modes and more complex launch scenarios, the entire tracefile has to be scanned to be sure to get everything. We usually need all meta-data for later association. For reasons of efficiency and storage size, I am assuming that we want to store meta-data separately also during later stages, but note that we never measured the advantage of this, and due to things like category tables etc., merged storage might actually be comparable.

In contrast, for activity data, it is often sufficient and quite often very useful to process just parts of it, usually temporal chunks For example, for analysis of performance, we usually need to differentiate at least where the system is starting up, idle, active, or shutting down. Many systems also frequently switch between active and idle.

Last, but not least, memory-wise it can be necessary to load data partially.

I think it doesn't matter very much in practice whether we store data after it has been converted into a pandas dataframe or before, assuming that we're using one of several data storage formats which can be easily written from and loaded into pandas dataframes (like those from Apache Arrow).

iluetkeb avatar Feb 10 '23 13:02 iluetkeb

Meta-data is emitted first, but due to things like the life-cycle, system modes and more complex launch scenarios, the entire tracefile has to be scanned to be sure to get everything.

I was hoping, but could not find evidence, that babeltrace2 would let us filter on event type/name, such that you could iterate for all metadata before doing filtered views of the longer running event data.

mjcarroll avatar Feb 10 '23 16:02 mjcarroll