[multicapture 3/3] Add multi-capture capabilities to `capture` + merge utility
suggestion towards #471
Description
While a network-level multiplex seems the most transparent way to implement multiprocess support, @cipharius and I have seen that it requires an advanced and very precise knowledge of the packets in transit. Thus there is an inherent fragility to it (cf https://github.com/wolfpld/tracy/pull/766#issue-2226491140).
Here, I wanted to propose a "dumber" and more stable approach that does not have to know a lot about the internals. My goal by doing so was to provide a first step to support multi-process workflows, with a minimal amount of maintenance. In the limited amount of work time I got, this looked like a more reasonable approach to explore first.
The general approach
-
multi-captureis a "generalized" version oftracy-capturethat listens to UDP discovery packets. Its job is to run one "capture" per discovered process. In more details, this boils down to creating multipleWorkerinstances pointing to the right addresses, and let them do their job. -
mergecan combine multiple trace files into a single one. Its strategy is- for each input file
- ask a
Workerto read it (like we do when opening the profiler GUI) - convert its internal data to lists of
ImportEventTimeline,ImportEventMessages...
- ask a
- merge all these event lists
- spawn a Worker to import those merged lists (that's the constructor we use for
import-chrome/import-fuchsia - ask this worker to save its data into an output file
- for each input file
Pros
- Both tools are very simple: they don't use
Workerinternals, nor do assumptions about the way things are stored / exchanged on the network - ...and as a consequence I expect them to be easy to maintain in the long term
- there isn't any performance impact for people working in single-process mode
- on its own,
multicaptureis useful too: it helps dealing with the fact that multiple tracy-enabled processes might be living at the same time, and the user might not know which one the standardcapturewill talk too.
Cons:
- we don't support live visualization, only capture
- we lose information in the process: frame images, lock events, system stack frames...
- the export-merge-import workflow is very inefficient w.r.t to what it could be. With the current approach,
- It's a 2-step process, and thus a bit cumbersome
While (1) is definitely a consequence of the design, I think (4) is totally fixable. We can work gradually work to improve the state of (2) and (3) by iterating on the existing import interface (that could benefit the import-chrome/fuchsia utilities too) and on the export process. It might be even possible to directly merge multiple Workers, but I haven't looked into that.
I am very aware that this is NOT the design path @wolfpld suggested when discussing multi-process support; so I wanted to show you the code as soon as I felt it worthy of your time.
As mentioned in the pros, it might well also be that multicapture is a sufficient solution on its own for some use cases. For my individual experience, this would already solve most of the pain when profiling a library used by several programs running together - but I can't presume whether this is a frequent pain point.
How to reproduce / test
Capture time
- simply run
tracy-multicapturewith a prefix name
./tracy-multicapture -o test
tracy-multi-capture-reenc.webm
Merge
- give the list of input traces to
tracy-merge
# or you can use a glob pattern `test.*.tracy` to select them all
./tracy-merge -o merged-output.tracy test.449505.tracy test.449456.tracy
reading test.449505.tracy
Waiting for source locations
tracy-test @ 2024-07-12 17:53:23 (tracy-test)
- 22 events
- 2776 messages
- 4 plots
reading test.449456.tracy
Waiting for source locations
tracy-test @ 2024-07-12 17:53:22 (tracy-test)
- 22 events
- 1038 messages
- 4 plots
Writing merged-output.tracy
done
Visualize
./tracy-profiler merged-output.tracy
multi-captureis a "generalized" version oftracy-capturethat listens to UDP discovery packets. Its job is to run one "capture" per discovered process. In more details, this boils down to creating multipleWorkerinstances pointing to the right addresses, and let them do their job.
we don't support live visualization, only capture
It may be worth considering if this can be done within the capture utility, as a step towards multiple simultaneous traces, like requested in #203.
I am very aware that this is NOT the design path @wolfpld suggested when discussing multi-process support; so I wanted to show you the code as soon as I felt it worthy of your time.
It's ok.
It may be worth considering if this can be done within the capture utility
I can have a look into that
It may be worth considering if this can be done within the capture utility
Some personal notes after a short glance:
:heavy_plus_sign: we can definitely do better in multicapture to emulate the capture experience
- reuse the same progress line (with Tx/Rx, elapsed time infos)
- print failures when they happen
:heavy_minus_sign: but I fear we won't be able to seamlessly integrate multi-capture
- minor, but I don't think we can keep the live-update print with multiple workers (we would need to go back to previous lines, that's not portable accross the supported platforms, is it?)
- there'll be some work in the CLI interface to properly convey the options available in multi-mode
-
addressandportdon't really make sense in multi-capture (though we could maybe later filter discovered clients by their IP address / ports) -
memlimitis easy to implement by worker, quite harder if the user expects to control the combined memory footprint of all of them - for the output file(s)
- in multi mode, maybe we can consider
-oas the common filename prefix, with the special case that-o name.tracymeans the set of filesname.<pid>.tracy - what do you recommend to do if we see that some
prefix.<number>.tracyexist and the user has not given-f? I'd recommend raising an error and exiting, but that might be overly restrictive (we'll only have a collision if we profile a process that has exactly this number, this is probably rare ... but we won't know until runtime).
- in multi mode, maybe we can consider
-
Next steps;
- I'll move functions that can be used in both utilities in a shared header, to at the very least minimize duplication
- I'll have a go implementing the missing features in multi-capture so it can act as a drop-in replacement
I'll push that further to the code and see how it goes
minor, but I don't think we can keep the live-update print with multiple workers (we would need to go back to previous lines, that's not portable accross the supported platforms, is it?)
I have no idea, but there are programs that are capable of doing that. Maybe the readline library would be of use here?
there'll be some work in the CLI interface to properly convey the options available in multi-mode
Valid points, but not show-stoppers. It may make sense to split the available options into three categories:
- Common options.
- Single capture options.
- Multi capture options.
I have no idea, but there are programs that are capable of doing that. Maybe the readline library would be of use here?
For me that was just about editing a single line (if you indeed mean https://tiswww.case.edu/php/chet/readline/rltop.html)
Since then I've found this SO thread - I'll have to test it but that might do the trick for linux at least : https://stackoverflow.com/a/11474509/7002298
I updated the branch with an updated multicapture utility that embeds the single capture mode (the end goal being that we merge this new file as capture.cpp, but for the time being it helps to have the reference source untouched)
General refactoring
- I have extracted some functions/constants from the code here and there to make them available without duplication. You can see the diff here: https://github.com/wolfpld/tracy/commit/3d235b8451159f97446a103578adea7747312d85. If you prefer, I can spin up a dedicated PR for this diff, there should not be any functional impact.
- Within the
capture/folder. I moved some common utilities to stand-alone functions inlib.cppto reduce duplication: handshake message check, instrumentation failure print, and (not yet integrated into multicapture mode) live update print. Again, the goal being there to avoid any functional impacts.
Integration of multi-capture in tracy-capture
Valid points, but not show-stoppers. It may make sense to split the available options into three categories:
I have a working argument parser that will dispatch then to runCaptureSingle or runCaptureMulti.
- I added support for long options on the way
- Here is the help message, do you think it clear enough ?
❯ ./tracy-multicapture
Usage: multicapture -o <output/prefix> [--multi] [-a address] [-p port] [-s seconds] [-m percent]
Detect and capture multiple tracy clients from their UDP broadcast,
and store the output in files named '<PREFIX>.<client_PID>.tracy'.
Options (a SINGLE tag indicates the option is only accessible in single capture mode):
-o/--output <str> Output file path (in MULTI mode, it is interpreted as a prefix)
-f/--force Overwrite existing files
-M/--multi Enable multi-capture mode
-v/--verbose Verbose output
-a/--address <ipv4 str> [SINGLE] Target IP address
-p/--port <int> [SINGLE] Target port
-s/--stop-after <float> [SINGLE] Stop profiling after this duration (in seconds)
-m/--memlimit <float> [SINGLE] Set a memory limit (in % of the total RAM)
In single-capture mode, 'capture' directly connects to the TCP data stream at address:port .
In multi-capture mode, profiled targets are detected from UDP broadcast packets;
capture stops once all detected targets disconnect.
- there are some options we could make use of in multicapture, but I suggest to defer that to a later MR
Next steps
Still to do:
- live print in multi-capture
- spin up a PR for a minor addition in
tracy-test: the ability to trigger some failures to check the profiler's response to them (commit: 59b2f2ef378ee907eb37c578405d8bd6639b853d) -> #833 - there are build failures in the CI for MacOS, to investigate
And to recap the things I'd welcome your input on:
- I'd feel more confident if we adopt this "embedding" strategy (=limit as much as possible the changes in the single-capture mode) as a first step to add support for multicapture. This way, we're sure to avoid breaking anything on the single-capture use-case, while being able to gradually refactor and merge the logic. WDYT ?
- OK with the CLI interface + help message ?
- OK with the refactoring commit (notably in
profiler/main.cpp). Do you prefer a separate PR ? - maybe the
mergerpart can go to a separate PR ?
EDIT: updated the link to commit after rebasing to master
Quite happy about this live update, what do you think ?
It seems to be a bit busy due to what I assume is debug info. Can you show a version that just adds a new line when another connection is established?
Quite happy about this live update, what do you think ?
It seems to be a bit busy due to what I assume is debug info. Can you show a version that just adds a new line when another connection is established?
Looks great!
If an error occurs (cf other MR ^^), we get this
It seems to be a bit busy due to what I assume is debug info. Can you show a version that just adds a new line when another connection is established?
I've now hidden most of the relevant (at least I think) additional infos behind --verbose
Marking the MR as ready for review
- CI passed on linux, windows and MacOS
- I'll remove the commit related to
.clang-format-> suggested in a different PR #835 , your call - If the code looks good I can replace
capturesource code with the updated version (and remove the existence oflip.cppfor common functions)
Regarding merge strategy, I can spin up smaller MRs with more atomic changes if you want to review things in multiple steps. It would look like
- (A) refactor existing code to extract common functions / constants
- (B) support for multi-process capture in
capture - (C)
tracy-mergeutility
It seems there are some commits here which are intended as fixups in a rebase.
It seems there are some commits here which are intended as fixups in a rebase.
not sure what you mean by that; in any case I rebased and cleaned up the branch history to clarify the separation A/B/C as mentioned in https://github.com/wolfpld/tracy/pull/825#issuecomment-2236307872
* 26df5524 (HEAD -> multicapture-merge) [multicapture] New binary, move common functions to lib.cpp
* 9111933e [capture] Refactor: extract common functions
* 11bea71b [merger] multiprocess/ folder, with tracy-merge binary
* 59a92210 [refactor] Extract useful functions/constants for capture
I extracted the non-functional refactors into #837 to simplify the review, and make sure those changes are not impacting anything. That may be a good starting point to merge the changes, and will reduce the diff here.
Same for #838 for the merge utility that has no interaction with the changes in capture
Note: if you're interested, I can also spin up a dedicated PR for the internal refactor in capture.cpp (commit https://github.com/wolfpld/tracy/commit/9111933e08a82fa50e902210b109e8bf5b0759f6). This way, the diff for the actual functional changes of multi-capture will be much clearer
Please run the changes you are proposing through clang-format. At this point it's hard to review.
Please run the changes you are proposing through clang-format. At this point it's hard to review.
Sure ! Just did it on this MR.
To clarify, do you intend to review separately #837 and #838, in which case I'll continue to keep them up-to-date, or will you tackle the whole change as a whole ? I'm was trying to save you time by splitting the work this way, but I'm not sure it's actually helping you - please clarify what you prefer :)
My suggestion would be to address #837 and #838, and then I can spin up a finishing MR that will tidy things up (without the intermediary lib.cpp, and actually replacing capture.cpp with the content of multicapturec.pp). But you prefer another way, please tell me so I can adapt
Yes, please do the other PRs too.
Hi ( somewhat new to github dont hesitate to direct me to the correct place for this if it's not here ) The tracy-merge util seems to delete some data on the traces.
- Lock
- ZoneText
are not visible after a merge
This code when captured and launched on the profiler gives this trace
#include <mutex>
#include <iostream>
#include "Tracy.hpp"
constexpr const char * const hello = "Hello, Tracy!";
constexpr const char * const mutex_name = "myMutex";
int main()
{
ZoneScoped;
ZoneText(hello, strlen(hello));
TracyLockableN (std::mutex, myMutex, mutex_name);
myMutex.lock();
sleep(1);
myMutex.unlock();
}
however after running it through the tracy-merge util the lock as well as CPU usage and ZoneText all disappear
Hi @afaure42 ! first of all, thanks for taking the time to test this branch :)
Sadly, the limitations you mentioned are expected - cf the first message of the MR -> 2nd limitation
Cons:
- we don't support live visualization, only capture
- we lose information in the process: frame images, lock events, system stack frames...
- the export-merge-import workflow is very inefficient w.r.t to what it could be. With the current approach,
- It's a 2-step process, and thus a bit cumbersome
I'm not saying it's impossible to do, but this would require more work. IIRC (I did this MR in summer), including those infos would require some reworks in the export/import process I'm using for this feature (we cannot yet import those infos via the existing TracyWorker import constructor).
Thanks for your answer @Arpafaucon !
I understand the limitations, i'll look into it maybe i can give it a try !