tracy
tracy copied to clipboard
Tracy does not stop captures right away
When capturing with TRACY_ON_DEMAND
, I would expect tracy to stop recording data when the server sends a "Stop" command (that is, when I click on )
However in my case, if I start my application with the server connected at boot, press the "Stop" button after 12s, Tracy will still capture zones information for a while (more than 2minutes, sometimes it won't even stop).
I think this is because it's trying to catch up due to the query backlog, but new information (zones) keep being added as can be seen in the following screenshot:
One can see that stack sampling did stop at ~12s, but zones are still being recorded. If the server is too slow (note that right now this is using it on my local machine where both the client and server are running), then tracy might keep recording new data and quickly eating a lot of memory.
At first I thought this was because the Client still considers the server to be connected as it does the following checks:
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
However I later realized that even when the Server is finally stabilized and really stops collecting data, it is still considered connected, so something else seems to be the issue here.
Is IsConnected()
the right thing to check (since the server is still connected while retrieving data after clicking stop)?
From what I understand, when we click stop it sends a ServerQueryDisconnect
.
It seems that the issue is that (in my case) this query itself is queued, but Profiler::HandleServerQuery
seems to only receive ServerQueryCallstackFrame
queries. (Which are slow to answer, and either keep getting sent by the server or are just way many).
In any case, while ServerQueryDisconnect
is not received, the profiler keeps generating more data that is pumped by the server.
Do we need some kind of special query that disables zones and bypasses the queue ?Is that even possible or a good idea ? Or do you have a better idea?
Workaround available in 9b624049.
It seems that the issue is that (in my case) this query itself is queued, but
Profiler::HandleServerQuery
seems to only receiveServerQueryCallstackFrame
queries. (Which are slow to answer, and either keep getting sent by the server or are just way many).
Recently call stack resolution was moved to a separate thread and this is not handled properly in the disconnect data handling logic.
Do we need some kind of special query that disables zones and bypasses the queue ?Is that even possible or a good idea ?
This is probably also a factor here. The solution is rather simple, as the send buffer size calculation logic already takes priority messages into account:
serverQuerySpaceLeft = std::min( ( m_sock.GetSendBufSize() / ServerQueryPacketSize ), 8*1024 ) - 4; // leave space for terminate request
So, for example, the terminate request can skip the queue and be sent directly.
void Worker::QueryTerminate()
{
ServerQueryPacket query { ServerQueryTerminate, 0, 0 };
m_sock.Send( &query, ServerQueryPacketSize );
}
The workaround does help stop the capture right away, thanks! However (as expected) a lot of strings names will be missing and the server will display ???
.
This was discussed a bit on Discord, not all strings will get resolved under load (which somehow happens really easily at application startup due to stack sampling and resolution being slower than the amount of data generated by the application).
Here is the scenario that causes issue and what I could debug
- Server queries information about callstacks and zones
- Server queries zone X information (pointers to strings)
- Server queues a lot more queries, including slow ones
- Server queues
ServerQueryDisconnect
as user clicks "Stop" - Server will receive the zone information (late, due to huge amount of queries/slow stack resolution queries). Since
ServerQueryDisconnect
was sent, the server does not ask for the zone X strings (name, location, ...), even though it may have been issues way before user clicks "Stop".
So there are two issues at hand here:
- We should tell the client to stop sending data about new zones as soon as possible to avoid overloading the queue
- The server may need to have 3 states (
connected
/closing connection
/stopped
) instead of simplyconnected
/stopped
. I think this is more or less linked to theTRACY_ON_DEMAND
mode only. While in theclosing connection
, the server would still continue to query any remaining string / stack resolution for zones/callstacks that arrived before the user clicks stop.
not all strings will get resolved under load (which somehow happens really easily at application startup due to stack sampling and resolution being slower than the amount of data generated by the application).
This should be fixed on master.
not all strings will get resolved under load (which somehow happens really easily at application startup due to stack sampling and resolution being slower than the amount of data generated by the application).
This should be fixed on master.
It seems that it is now indeed resolving zones before callstacks!