tracy icon indicating copy to clipboard operation
tracy copied to clipboard

Add suspend/resume profiler

Open foxtran opened this issue 11 months ago • 10 comments

This patch implements suspend/resume calls for profiler that stops generating data for server. It is useful for long-living applications which can produce extremely huge data.

Users must be careful about usage TracySuspend/TracyResume inside of zones.

I have fixed couple bugs with proper exit of Tracy client.

Closes #952

foxtran avatar Jan 14 '25 09:01 foxtran

I don't really see how this could work correctly in a multithreaded application.

wolfpld avatar Jan 14 '25 10:01 wolfpld

It just stops collecting of data from Tracy calls for whole application, so it works fine.

For example, there is 4-threaded OpenMP application, which I'm using for testing:

image

Client does not get Tracy events and therefore it is just a long bar comes after TracySuspend call. When TracyResume was called, Tracy works as before.

P.S. I still did not get why main thread does not have OpenMP sections :(

foxtran avatar Jan 14 '25 10:01 foxtran

It just stops collecting of data from Tracy calls for whole application, so it works fine.

This is exactly why it won't work fine.

T1: suspend profiling T2: enter zone, ignore event T1: resume profiling T2: leave zone, send event

At this point the state on client and server is desynchronized and things can't work properly.

wolfpld avatar Jan 14 '25 10:01 wolfpld

Yep, that is true. Luckily, some applications (mostly, HPC) use control-flow thread only on which suspending/resuming can happen. So, T1: suspends profiling T1: spawns T2-Tn T1-Tn: do collective work T1-Tn: may enter and exit from zones (according to zone scope rules) T2-Tn: die (actually, just sleeping) T1: resumes profiling

foxtran avatar Jan 14 '25 10:01 foxtran

If one will add some unique info about zones that would be unique for each zone (even the same in cycle), it will be possible to improve understanding by GUI app, that zone was created in suspend mode, but exited in resumed. Unfortunately, it will not help, if zone was created in resumed mode and ended in suspended.

foxtran avatar Jan 14 '25 11:01 foxtran

It seems to me like what you are looking for here is the active parameter in the macros.

wolfpld avatar Jan 14 '25 11:01 wolfpld

active is not a solution, since it still produces a lot of data (mostly from callstacks, I think).

foxtran avatar Jan 14 '25 11:01 foxtran

Callstack collection should be paused when on demand mode is used and no connection is established. You should be able to extend this to you use case.

wolfpld avatar Jan 14 '25 11:01 wolfpld

These is an example how I'm using Tracy on HPC cluster. It would be a bit hard to detect a right time when to start data collection (more specifically, start tracy-capture) and when to stop for avoiding out-of-memory errors from logs of application. I think I can start tracy-capture from my app for debugging itself :-)

foxtran avatar Jan 14 '25 11:01 foxtran

active is not a solution, since it still produces a lot of data

Are you sure? If the active argument is false, it's an early out and nothing gets sent: https://github.com/wolfpld/tracy/blob/8e388d6d4edacd579a2486fd8aae6ad53f8628b3/public/client/TracyScoped.hpp#L33

slomp avatar Aug 03 '25 00:08 slomp