Refactoring ETW kernel session
Switching from the shared (singleton) "NT Kernel Logger" over to private Kernel logging session (Windows 8+). VSync events have also been unified under the same private session and event record callback.
I'll be adding CPU frequency/throttling events in the future.
As part of this refactoring, I am also inclined to move the following event definitions:
https://github.com/wolfpld/tracy/blob/886a0abec9f5267f2bf145eb698bac80df80b27a/public/client/TracySysTrace.cpp#L83-L141
into the newly introduced TracyETW.cpp file.
Any objections?
These structs are used in EventRecordCallback. How will this work now?
These structs are used in
EventRecordCallback. How will this work now?
The EventRecordCallback would just reference them through the tracy::etw namespace (e.g. etw::CSwitchEvent).
I am also thinking about defining the "magic" opcode numbers as static constexpr members of these structs:
if( hdr.EventDescriptor.Opcode == etw::CSwitchEvent::Opcode )
{
const auto cswitch = (const etw::CSwitchEvent*)record->UserData;
...
}
As part of this refactoring, I am also inclined to move the following event definitions:
https://github.com/wolfpld/tracy/blob/886a0abec9f5267f2bf145eb698bac80df80b27a/public/client/TracySysTrace.cpp#L83-L141
into the newly introduced
TracyETW.cppfile. Any objections?
None for me, centralizing ETW stuff might even be useful in the future if we ever want to reuse those to allow importing ETL captures
I moved the events to TracyETW.cpp and added some comments and some alignment/size checks:
https://github.com/wolfpld/tracy/pull/1213/commits/7b5de9e2b9ae74ff1efd3d0453a4aee837dcade8
We could add some runtime checks during system trace initialization to query the events (and respective fields) we care about via the TDH API, but that's going to be a bit annoying to do.
(don't merge it yet; there seems to be a quirk with VSync I need to investigate)
Actually, never mind...
I thought there was something funny going on with VSync event setup, but it turns out VSync events are just funky.
Sometimes, out of nowhere, I stop receiving them, and that's the case with the current master branch as well.
After a while, I start receiving them again.
@Lectem About the VSync issue is associated with "best-effort" guarantees that ETW makes regarding high volume events. ETW will happily let you enable a provider in several sessions, and will deliver events the best it can, but for high volume events (and apparently, VSync is considered high-volume enough), it won't guarantee that such events will be logged to all sessions where the provider is enabled.
This also explains why I was getting DxgKrnl events, just not the VSync ones. I found out I still had a "Tracy VSync" and an "NT Kernel Logger" that enabled DxgKrnl provider, and after stopping them, I started to get VSync events. It also explains why I would also not receive VSync events occasionally rom the "Tracy VSync" in master (another session I was experimenting with was still running).
@Lectem About the VSync issue is associated with "best-effort" guarantees that ETW makes regarding high volume events. ETW will happily let you enable a provider in several sessions, and will deliver events the best it can, but for high volume events (and apparently, VSync is considered high-volume enough), it won't guarantee that such events will be logged to all sessions where the provider is enabled.
This also explains why I was getting DxgKrnl events, just not the VSync ones. I found out I still had a "Tracy VSync" and an "NT Kernel Logger" that enabled DxgKrnl provider, and after stopping them, I started to get VSync events. It also explains why I would also not receive VSync events occasionally rom the "Tracy VSync" in master (another session I was experimenting with was still running).
For the future, we need to figure out a way to query whether a (high-volume) provider is enabled in another concurrent session or not. If you have ideas on how to do that, let me know.
For the future, we need to figure out a way to query whether a (high-volume) provider is enabled in another concurrent session or not. If you have ideas on how to do that, let me know.
It looks like what we need is EnumerateTraceGuidsEx() with the TraceGuidQueryInfo argument.
We'll get back a TRACE_PROVIDER_INSTANCE_INFO struct with an EnableCount field, and an array (of that size) of TRACE_ENABLE_INFO structs with information about each enabled instance, including the MatchAny|AllKeyword filters.
It looks like what we need is
EnumerateTraceGuidsEx()with theTraceGuidQueryInfoargument. We'll get back aTRACE_PROVIDER_INSTANCE_INFOstruct with anEnableCountfield, and an array (of that size) ofTRACE_ENABLE_INFOstructs with information about each enabled instance, including theMatchAny|AllKeywordfilters.
Here's what I got from my system:
DxgKrnl GUID: 802EC45A-1E99-4B83-9920-87C98277BA9D
EnableCount: 3
LoggerId: 10 | Name: EventLog-System | Level: 255 | Any: 0x0C00000000000000 | All: 0x0000000000000000
LoggerId: 7 | Name: DiagLog | Level: 255 | Any: 0x0000000002030000 | All: 0x0000000000000000
LoggerId: 44 | Name: FrameViewService | Level: 4 | Any: 0x0000000008000001 | All: 0x0000000000000001
The first two are from Windows itself, but they don't collect VSync events (judging by the Match keyword flags). The last one, FrameViewService, is from NVIDIA, and is monitoring VSync events.
Experimentally, it looks like only two sessions at most will end up receiving VSync events.
Interesting, so if you use both PresentMon and Tracy at the same time for example, things will be dropped?
Interesting, so if you use both PresentMon and Tracy at the same time for example, things will be dropped?
That's the thing: maybe. I am not sure how these limits apply. For example, I was able to start 6 private kernel sessions concurrently and receive VSync events in all of them (I actually ran out of kernel sessions, the system-wide limit seems to be 8).
But other times, I simply was not getting any VSync event, yet was getting all other DxgKrnl events. Seems pretty random and arbitrary, and I don't think there's an easy (or any) way to control or circumvent it. I think this is rare enough that it should not matter. It could also be tied to the mixture of other high-volume event providers that the concurrent sessions are collecting.
I added some code to detect (and log) whether a provider has been enabled by another session elsewhere.
I think this concludes all the review changes and quirks I encountered.