tracy [multicapture 3/3] Add multi-capture capabilities to `capture` + merge utility

suggestion towards #471

Description

While a network-level multiplex seems the most transparent way to implement multiprocess support, @cipharius and I have seen that it requires an advanced and very precise knowledge of the packets in transit. Thus there is an inherent fragility to it (cf https://github.com/wolfpld/tracy/pull/766#issue-2226491140).

Here, I wanted to propose a "dumber" and more stable approach that does not have to know a lot about the internals. My goal by doing so was to provide a first step to support multi-process workflows, with a minimal amount of maintenance. In the limited amount of work time I got, this looked like a more reasonable approach to explore first.

The general approach

multi-capture is a "generalized" version of tracy-capture that listens to UDP discovery packets. Its job is to run one "capture" per discovered process. In more details, this boils down to creating multiple Worker instances pointing to the right addresses, and let them do their job.
merge can combine multiple trace files into a single one. Its strategy is
- for each input file
  - ask a Worker to read it (like we do when opening the profiler GUI)
  - convert its internal data to lists of ImportEventTimeline, ImportEventMessages...
- merge all these event lists
- spawn a Worker to import those merged lists (that's the constructor we use for import-chrome/import-fuchsia
- ask this worker to save its data into an output file

Pros

Both tools are very simple: they don't use Worker internals, nor do assumptions about the way things are stored / exchanged on the network
...and as a consequence I expect them to be easy to maintain in the long term
there isn't any performance impact for people working in single-process mode
on its own, multicapture is useful too: it helps dealing with the fact that multiple tracy-enabled processes might be living at the same time, and the user might not know which one the standard capture will talk too.

Cons:

we don't support live visualization, only capture
we lose information in the process: frame images, lock events, system stack frames...
the export-merge-import workflow is very inefficient w.r.t to what it could be. With the current approach,
It's a 2-step process, and thus a bit cumbersome

While (1) is definitely a consequence of the design, I think (4) is totally fixable. We can work gradually work to improve the state of (2) and (3) by iterating on the existing import interface (that could benefit the import-chrome/fuchsia utilities too) and on the export process. It might be even possible to directly merge multiple Workers, but I haven't looked into that.

I am very aware that this is NOT the design path @wolfpld suggested when discussing multi-process support; so I wanted to show you the code as soon as I felt it worthy of your time.

As mentioned in the pros, it might well also be that multicapture is a sufficient solution on its own for some use cases. For my individual experience, this would already solve most of the pain when profiling a library used by several programs running together - but I can't presume whether this is a frequent pain point.

How to reproduce / test

Capture time

simply run tracy-multicapture with a prefix name

./tracy-multicapture -o test

tracy-multi-capture-reenc.webm

Merge

give the list of input traces to tracy-merge

# or you can use a glob pattern `test.*.tracy` to select them all
./tracy-merge -o merged-output.tracy test.449505.tracy test.449456.tracy

reading test.449505.tracy
Waiting for source locations
tracy-test @ 2024-07-12 17:53:23 (tracy-test)
- 22 events
- 2776 messages
- 4 plots
reading test.449456.tracy
Waiting for source locations
tracy-test @ 2024-07-12 17:53:22 (tracy-test)
- 22 events
- 1038 messages
- 4 plots
Writing merged-output.tracy
done

Visualize

./tracy-profiler merged-output.tracy

maim_2024-07-12T18-35-14+02-00

Jul 12 '24 18:07 Arpafaucon

multi-capture is a "generalized" version of tracy-capture that listens to UDP discovery packets. Its job is to run one "capture" per discovered process. In more details, this boils down to creating multiple Worker instances pointing to the right addresses, and let them do their job.

we don't support live visualization, only capture

It may be worth considering if this can be done within the capture utility, as a step towards multiple simultaneous traces, like requested in #203.

I am very aware that this is NOT the design path @wolfpld suggested when discussing multi-process support; so I wanted to show you the code as soon as I felt it worthy of your time.

It's ok.

Jul 13 '24 12:07 wolfpld

It may be worth considering if this can be done within the capture utility

I can have a look into that

Jul 13 '24 16:07 Arpafaucon

It may be worth considering if this can be done within the capture utility

Some personal notes after a short glance:

:heavy_plus_sign: we can definitely do better in multicapture to emulate the capture experience

reuse the same progress line (with Tx/Rx, elapsed time infos)
print failures when they happen

:heavy_minus_sign: but I fear we won't be able to seamlessly integrate multi-capture

minor, but I don't think we can keep the live-update print with multiple workers (we would need to go back to previous lines, that's not portable accross the supported platforms, is it?)
there'll be some work in the CLI interface to properly convey the options available in multi-mode
- address and port don't really make sense in multi-capture (though we could maybe later filter discovered clients by their IP address / ports)
- memlimit is easy to implement by worker, quite harder if the user expects to control the combined memory footprint of all of them
- for the output file(s)
  - in multi mode, maybe we can consider -o as the common filename prefix, with the special case that -o name.tracy means the set of files name.<pid>.tracy
  - what do you recommend to do if we see that some prefix.<number>.tracy exist and the user has not given -f? I'd recommend raising an error and exiting, but that might be overly restrictive (we'll only have a collision if we profile a process that has exactly this number, this is probably rare ... but we won't know until runtime).

Next steps;

I'll move functions that can be used in both utilities in a shared header, to at the very least minimize duplication
I'll have a go implementing the missing features in multi-capture so it can act as a drop-in replacement

I'll push that further to the code and see how it goes

Jul 14 '24 20:07 Arpafaucon

minor, but I don't think we can keep the live-update print with multiple workers (we would need to go back to previous lines, that's not portable accross the supported platforms, is it?)

I have no idea, but there are programs that are capable of doing that. Maybe the readline library would be of use here?

there'll be some work in the CLI interface to properly convey the options available in multi-mode

Valid points, but not show-stoppers. It may make sense to split the available options into three categories:

Common options.
Single capture options.
Multi capture options.

Jul 16 '24 18:07 wolfpld

I have no idea, but there are programs that are capable of doing that. Maybe the readline library would be of use here?

For me that was just about editing a single line (if you indeed mean https://tiswww.case.edu/php/chet/readline/rltop.html)

Since then I've found this SO thread - I'll have to test it but that might do the trick for linux at least : https://stackoverflow.com/a/11474509/7002298

Jul 16 '24 19:07 Arpafaucon

I updated the branch with an updated multicapture utility that embeds the single capture mode (the end goal being that we merge this new file as capture.cpp, but for the time being it helps to have the reference source untouched)

General refactoring

I have extracted some functions/constants from the code here and there to make them available without duplication. You can see the diff here: https://github.com/wolfpld/tracy/commit/3d235b8451159f97446a103578adea7747312d85. If you prefer, I can spin up a dedicated PR for this diff, there should not be any functional impact.
Within the capture/ folder. I moved some common utilities to stand-alone functions in lib.cpp to reduce duplication: handshake message check, instrumentation failure print, and (not yet integrated into multicapture mode) live update print. Again, the goal being there to avoid any functional impacts.

Integration of multi-capture in `tracy-capture`

Valid points, but not show-stoppers. It may make sense to split the available options into three categories:

I have a working argument parser that will dispatch then to runCaptureSingle or runCaptureMulti.

I added support for long options on the way
Here is the help message, do you think it clear enough ?

❯ ./tracy-multicapture
Usage: multicapture -o <output/prefix> [--multi] [-a address] [-p port] [-s seconds] [-m percent]
Detect and capture multiple tracy clients from their UDP broadcast,
 and store the output in files named '<PREFIX>.<client_PID>.tracy'.

Options (a SINGLE tag indicates the option is only accessible in single capture mode):
  -o/--output <str>        Output file path (in MULTI mode, it is interpreted as a prefix)
  -f/--force               Overwrite existing files
  -M/--multi               Enable multi-capture mode
  -v/--verbose             Verbose output
  -a/--address <ipv4 str>  [SINGLE] Target IP address
  -p/--port <int>          [SINGLE] Target port
  -s/--stop-after <float>  [SINGLE] Stop profiling after this duration (in seconds)
  -m/--memlimit <float>    [SINGLE] Set a memory limit (in % of the total RAM)

In single-capture mode, 'capture' directly connects to the TCP data stream at address:port .
In multi-capture mode, profiled targets are detected from UDP broadcast packets;
capture stops once all detected targets disconnect.

there are some options we could make use of in multicapture, but I suggest to defer that to a later MR

Next steps

Still to do:

live print in multi-capture
spin up a PR for a minor addition in tracy-test: the ability to trigger some failures to check the profiler's response to them (commit: 59b2f2ef378ee907eb37c578405d8bd6639b853d) -> #833
there are build failures in the CI for MacOS, to investigate

And to recap the things I'd welcome your input on:

I'd feel more confident if we adopt this "embedding" strategy (=limit as much as possible the changes in the single-capture mode) as a first step to add support for multicapture. This way, we're sure to avoid breaking anything on the single-capture use-case, while being able to gradually refactor and merge the logic. WDYT ?
OK with the CLI interface + help message ?
OK with the refactoring commit (notably in profiler/main.cpp). Do you prefer a separate PR ?
maybe the merger part can go to a separate PR ?

EDIT: updated the link to commit after rebasing to master

Jul 17 '24 15:07 Arpafaucon

Quite happy about this live update, what do you think ?

tracy-multi-capture-liveupdate.webm

Jul 17 '24 19:07 Arpafaucon

Quite happy about this live update, what do you think ?

It seems to be a bit busy due to what I assume is debug info. Can you show a version that just adds a new line when another connection is established?

Jul 17 '24 19:07 wolfpld

Quite happy about this live update, what do you think ?

It seems to be a bit busy due to what I assume is debug info. Can you show a version that just adds a new line when another connection is established?

tracy-multi-capture-liveupdate-clean.webm

Jul 17 '24 19:07 Arpafaucon

Looks great!

Jul 17 '24 20:07 wolfpld

If an error occurs (cf other MR ^^), we get this

maim_2024-07-17T22-00-11+02-00

It seems to be a bit busy due to what I assume is debug info. Can you show a version that just adds a new line when another connection is established?

I've now hidden most of the relevant (at least I think) additional infos behind --verbose

Jul 17 '24 20:07 Arpafaucon

Marking the MR as ready for review

CI passed on linux, windows and MacOS
I'll remove the commit related to .clang-format -> suggested in a different PR #835 , your call
If the code looks good I can replace capture source code with the updated version (and remove the existence of lip.cppfor common functions)

Regarding merge strategy, I can spin up smaller MRs with more atomic changes if you want to review things in multiple steps. It would look like

(A) refactor existing code to extract common functions / constants
(B) support for multi-process capture in capture
(C) tracy-merge utility

Jul 18 '24 11:07 Arpafaucon

It seems there are some commits here which are intended as fixups in a rebase.

Jul 18 '24 20:07 wolfpld

It seems there are some commits here which are intended as fixups in a rebase.

not sure what you mean by that; in any case I rebased and cleaned up the branch history to clarify the separation A/B/C as mentioned in https://github.com/wolfpld/tracy/pull/825#issuecomment-2236307872

* 26df5524 (HEAD -> multicapture-merge) [multicapture] New binary, move common functions to lib.cpp
* 9111933e [capture] Refactor: extract common functions
* 11bea71b [merger] multiprocess/ folder, with tracy-merge binary
* 59a92210 [refactor] Extract useful functions/constants for capture

Jul 19 '24 14:07 Arpafaucon

I extracted the non-functional refactors into #837 to simplify the review, and make sure those changes are not impacting anything. That may be a good starting point to merge the changes, and will reduce the diff here.

Same for #838 for the merge utility that has no interaction with the changes in capture

Jul 19 '24 14:07 Arpafaucon

Note: if you're interested, I can also spin up a dedicated PR for the internal refactor in capture.cpp (commit https://github.com/wolfpld/tracy/commit/9111933e08a82fa50e902210b109e8bf5b0759f6). This way, the diff for the actual functional changes of multi-capture will be much clearer

Jul 19 '24 14:07 Arpafaucon

Please run the changes you are proposing through clang-format. At this point it's hard to review.

Aug 08 '24 17:08 wolfpld

Please run the changes you are proposing through clang-format. At this point it's hard to review.

Sure ! Just did it on this MR.

To clarify, do you intend to review separately #837 and #838, in which case I'll continue to keep them up-to-date, or will you tackle the whole change as a whole ? I'm was trying to save you time by splitting the work this way, but I'm not sure it's actually helping you - please clarify what you prefer :)

Aug 08 '24 23:08 Arpafaucon

My suggestion would be to address #837 and #838, and then I can spin up a finishing MR that will tidy things up (without the intermediary lib.cpp, and actually replacing capture.cpp with the content of multicapturec.pp). But you prefer another way, please tell me so I can adapt

Aug 08 '24 23:08 Arpafaucon

Yes, please do the other PRs too.

Aug 08 '24 23:08 wolfpld

Hi ( somewhat new to github dont hesitate to direct me to the correct place for this if it's not here ) The tracy-merge util seems to delete some data on the traces.

Lock
ZoneText

are not visible after a merge

This code when captured and launched on the profiler gives this trace

#include <mutex>
#include <iostream>
#include "Tracy.hpp"

constexpr const char * const hello = "Hello, Tracy!";
constexpr const char * const mutex_name = "myMutex";
int main()
{
	ZoneScoped;
	ZoneText(hello, strlen(hello));
	TracyLockableN (std::mutex, myMutex, mutex_name);

	myMutex.lock();
	sleep(1);
	myMutex.unlock();
}

Screenshot from 2024-11-08 15-02-45

however after running it through the tracy-merge util the lock as well as CPU usage and ZoneText all disappear

Screenshot from 2024-11-08 15-04-00

Nov 08 '24 14:11 afaure42

Hi @afaure42 ! first of all, thanks for taking the time to test this branch :)

Sadly, the limitations you mentioned are expected - cf the first message of the MR -> 2nd limitation

Cons:

we don't support live visualization, only capture

we lose information in the process: frame images, lock events, system stack frames...

the export-merge-import workflow is very inefficient w.r.t to what it could be. With the current approach,

It's a 2-step process, and thus a bit cumbersome

I'm not saying it's impossible to do, but this would require more work. IIRC (I did this MR in summer), including those infos would require some reworks in the export/import process I'm using for this feature (we cannot yet import those infos via the existing TracyWorker import constructor).

Nov 08 '24 14:11 Arpafaucon

Thanks for your answer @Arpafaucon !

I understand the limitations, i'll look into it maybe i can give it a try !

Nov 10 '24 10:11 afaure42

[multicapture 3/3] Add multi-capture capabilities to `capture` + merge utility

Description

How to reproduce / test

Capture time

Merge

Visualize

General refactoring

Integration of multi-capture in tracy-capture

Next steps

Integration of multi-capture in `tracy-capture`