perfview TraceLog streaming/in-memory EventPipe support

I'm trying to make changes to enable TraceLog use without creating an ETLX file from an EventPipe (nettrace). I've taken the existing ETW streaming support as a baseline and tried to adapt that. However, I'm having trouble getting the newly added tests to pass because the events coming from TraceLog don't match what I'd get from EventPipeEventSource (the previous test case, Streaming in the same file).

@brianrob maybe if you could have a look at my changes, you could spot an issue what I'm doing wrong (or not doing and I should be) to get this working properly? Or is there some bigger problem I'm missing completely that prevents this from happening?

The use-case I'm going after is a continuously running sampling profiler that only samples as needed (because actually starting the profiler takes a lot of time in the CLR so I want to do it only once).

This is a follow up, though not strictly an implementation of, #1829

May 04 '23 21:05 vaind

@microsoft-github-policy-service agree

May 08 '23 18:05 vaind

@vaind, thanks for working on this. I think at a high level this should work. A couple of thoughts:

It would be worthwhile to architect this similiar to how we do with TraceEventSession for live sessions. This allows the TraceLog to know that it is being used in a live mode instead of a file-based mode. There are some differences in how one can use the TraceLog when its live, because it hasn't seen the "end" of the trace, unlike in the file-based mode.
Can you share what differences you're seeing? It's possible that it's just some piece of data that hasn't made it over to the TraceLog.

May 08 '23 23:05 brianrob

@vaind, thanks for working on this. I think at a high level this should work. A couple of thoughts:

Thanks for getting back to me, it indeed seems to work from my testing in the Sentry profiling SDK, with some quirks so far (I didn't get stack trace method resolution working yet, but from my understanding it should work because all the info has already come from the runtime). Also the ActivityComputer (and others) don't seem to work when streaming at the moment so that would also need some tuning I guess. I was able to get around them for now but I think they're useful so I'll likely get back to them in the future.

It would be worthwhile to architect this similiar to how we do with TraceEventSession for live sessions. This allows the TraceLog to know that it is being used in a live mode instead of a file-based mode. There are some differences in how one can use the TraceLog when its live, because it hasn't seen the "end" of the trace, unlike in the file-based mode.

Yes, I've made the change by reusing the same code, really, with a minor change of setting up the kernelSession in the TraceEventSession factory instead of the common constructor.

Can you share what differences you're seeing? It's possible that it's just some piece of data that hasn't made it over to the TraceLog.

e.g the following line is missing from the captured output but is present in the eventpipe-dotnetcore2.1-linux-x64-objver3.netperf.baseline.txt:

System.Threading.Tasks.TplEventSource/SetActivityId, 1, <Event MSec=   "111.4517" PID="80749" PName="Process(80749)" TID="80749" EventName="SetActivityId" ProviderName="System.Threading.Tasks.TplEventSource" NewId="10000000-0000-0000-0000-000000000001"/>

Also, all the custom events are missing, as present in eventpipe-dotnetcore2.1-linux-x64-tracelogging.netperf.baseline.txt

I'm pretty sure it's a test-code issue, because as I've mentioned earlier, the overall approach seems to work in general, based on my changes to the profiling code in the Sentry SDK which now starts the profiler early in the process and slices profiles when actually needed (drops the other samples if not currently needed). I think this functionality will need better tests here, instead of trying to reuse what was already present in EventPipeParsing/Streaming.

May 09 '23 19:05 vaind

So the issue I'm having with stacktrace info not being available is due to missing modules & method info because the rundown on a session only happens when a session is stopped, while I'm trying to access this info while the sample-profiler is running (on each sample). I've raised an inquiry for this issue: https://github.com/dotnet/runtime/issues/86103

I was wondering whether I could start & stop a session, in the beginning, to force the rundown to happen and then use these events to feed TraceLog. Afterwards, the runtime information would rely on the runtime provider to get updates. @brianrob do you think this could work, conceptually or do you see any issues that would need to be overcome (or prevent this completely)?

May 11 '23 15:05 vaind

I was wondering whether I could start & stop a session, in the beginning, to force the rundown to happen and then use these events to feed TraceLog. Afterwards, the runtime information would rely on the runtime provider to get updates. @brianrob do you think this could work, conceptually or do you see any issues that would need to be overcome (or prevent this completely)?

So I've verified this works fine, I'll just need to add test cases to this repo so that it's covered here.

May 19 '23 17:05 vaind

@brianrob Any chance you could have a first look at TraceLog.cs changes if they seem reasonable? If so, I'd update tests to cover the changes & mark this PR as Ready for review

May 24 '23 07:05 vaind

@vaind, apologies for not getting to this sooner. I am reviewing your code.

May 25 '23 01:05 brianrob

@vaind, the concept here looks good. One thing I think we will need to resolve before this is ready for review is the rundown design. Your instinct to trigger a rundown at the beginning of the trace is the right one. I was going to recommend that you use the StartRundown keyword to do this, but it doesn't look like this is plumbed through the runtime, so it wouldn't have any impact. If I recall correctly, EventPipe has a relatively hardcoded rundown setup, in which case we may not have many options here.

@davmason, do you know if rundown is configurable at all? If I recall correctly, end rundown is hardcoded at the end of the session. Is that right? I'm trying to figure out the best option for @vaind to trigger a start rundown at the beginning of streaming so that TraceLog gets method rundown information and can resolve stacks.

@vaind, once this is resolved, then I think it will make sense to review this more. I saw some diffs that I want to examine further, but I'm not on a great network connection today and so I don't have access to a quality diff tool.

May 25 '23 02:05 brianrob

@vaind, the concept here looks good. One thing I think we will need to resolve before this is ready for review is the rundown design. Your instinct to trigger a rundown at the beginning of the trace is the right one. I was going to recommend that you use the StartRundown keyword to do this, but it doesn't look like this is plumbed through the runtime, so it wouldn't have any impact. If I recall correctly, EventPipe has a relatively hardcoded rundown setup, in which case we may not have many options here.

Thanks for getting back to me @brianrob. And yes, there doesn't seem a way to trigger a rundown on the same session. I've discussed that in the runtime issue, I've created two weeks ago: https://github.com/dotnet/runtime/issues/86103

May 25 '23 06:05 vaind

@davmason, do you know if rundown is configurable at all? If I recall correctly, end rundown is hardcoded at the end of the session. Is that right? I'm trying to figure out the best option for @vaind to trigger a start rundown at the beginning of streaming so that TraceLog gets method rundown information and can resolve stacks.

Rundown is not currently configurable with EventPipe. You are remembering correctly, it is hardcoded to run at the end of the session. The internal code has the option to configure whether to do rundown or not, but we don't expose it over the IPC commands so customers can't turn it off or on. It would be fairly easy to expose the option to have rundown or not, and I don't think it would be that much work to have it at the start of a trace. I think we could reuse all the existing code

May 31 '23 08:05 davmason

@davmason, do you know if rundown is configurable at all? If I recall correctly, end rundown is hardcoded at the end of the session. Is that right? I'm trying to figure out the best option for @vaind to trigger a start rundown at the beginning of streaming so that TraceLog gets method rundown information and can resolve stacks.

Rundown is not currently configurable with EventPipe. You are remembering correctly, it is hardcoded to run at the end of the session. The internal code has the option to configure whether to do rundown or not, but we don't expose it over the IPC commands so customers can't turn it off or on. It would be fairly easy to expose the option to have rundown or not, and I don't think it would be that much work to have it at the start of a trace. I think we could reuse all the existing code

Thanks @davmason. Sounds like we should keep https://github.com/dotnet/runtime/issues/86103 open to track making it possible to configure rundown via IPC, but for now, hacking things a bit here to use EndRundown sounds fine to me.

@vaind, once you're ready, go ahead and mark this as ready for review and I'll take a more thorough pass on it.

May 31 '23 14:05 brianrob

@vaind, once you're ready, go ahead and mark this as ready for review and I'll take a more thorough pass on it.

I've done so. It does need tests but I want to make sure the approach and APIs are OK-ish before trying to devise tests for them.

May 31 '23 15:05 vaind

P.S. I've tested it as a consumer in Sentry .NET SDK. What I meant by needing tests is that it needs tests in this repo.

May 31 '23 15:05 vaind

I've made some changes as requested and replied to some questions.

Keeping the testing open until the questions are resolved.

@brianrob Have you been able to finish reviewing this? Any more concerns or do you think it would be acceptable after adding tests for the changes/new APIs? Also, do you have any preferences for the testing approach?

Jul 06 '23 09:07 vaind

@brianrob this would really help unblock some work we want to do for .NET here at Sentry. Another round of review much appreciated

Jul 24 '23 18:07 bruno-garcia

I've made some changes as requested and replied to some questions. Keeping the testing open until the questions are resolved.

@brianrob Have you been able to finish reviewing this? Any more concerns or do you think it would be acceptable after adding tests for the changes/new APIs? Also, do you have any preferences for the testing approach?

Just did a pass. This is looking good. I added a bit of comment on testing, but not sure if that's what you're looking for. Let me know and I can think on this some more. Thanks.

Jul 25 '23 23:07 brianrob

@brianrob I'd say this is "done" from my POV. Please let me know if there are any more changes you'd like me to make.

The CI fails with a lot of errors I have no idea about. They go along the lines: The nuget command failed with exit code(1) and error(Unable to find version '2.1.30214' of package 'Rx-Interfaces'.

Do you know what that could be about? I've not touched anything related to those AFAIK.

Aug 04 '23 15:08 vaind

/azp run

Aug 08 '23 23:08 brianrob

Azure Pipelines successfully started running 1 pipeline(s).

Aug 08 '23 23:08 azure-pipelines[bot]

FYI, I am in the middle of reviewing this, but need to stop right now. Things are looking good though - I need to look at the test changes and the CI failures.

Aug 08 '23 23:08 brianrob

FYI, I am in the middle of reviewing this, but need to stop right now. Things are looking good though - I need to look at the test changes and the CI failures.

I've tried a couple of things to get this to work but am always hitting a roadblock.
Compilation is fixed by switching to .NET 6.0 in the UseDotNet task - it still seems to compile for net462 just fine. Not sure what the task does exactly as I don't have experience with Azure pipelines. However, tests then start to fail (test runner crashes). I've figured out, this is caused by WPF tests in StackWindowTests.cs. Since I can't find a clean solution without drawbacks, I think it will need some input from you which way you'd want to go from here.

Aug 09 '23 16:08 vaind

FYI I am doing some experiments to see if I can help unblock the tests: https://github.com/microsoft/perfview/pull/1894.

Aug 10 '23 00:08 brianrob

In my testing over in #1894 I discovered that running each of the test assemblies separately works. It seems that the combination of all of the test runs is an issue. I'm wondering if the mix of .NET Framework and .NET Core is the issue - I don't have a clear picture of exactly how that's happening on the test side. I wonder if we need to find a way to do an explicit .NET Core test run and an explicit .NET Framework test run, and choose the specific assemblies that should run for each. This would allow us to avoid targeting the WPF tests at .NET Core, and would allow us to make sure that we're not attempting to run .NET Framework code against .NET Core and vice versa. Thoughts?

Aug 10 '23 23:08 brianrob

In my testing over in #1894 I discovered that running each of the test assemblies separately works. It seems that the combination of all of the test runs is an issue. I'm wondering if the mix of .NET Framework and .NET Core is the issue - I don't have a clear picture of exactly how that's happening on the test side. I wonder if we need to find a way to do an explicit .NET Core test run and an explicit .NET Framework test run, and choose the specific assemblies that should run for each. This would allow us to avoid targeting the WPF tests at .NET Core, and would allow us to make sure that we're not attempting to run .NET Framework code against .NET Core and vice versa. Thoughts?

In that case, how about merging this PR first and then there's not so much duplication going on: https://github.com/microsoft/perfview/pull/1896

Aug 11 '23 07:08 vaind

@brianrob @vaind wondering if there's any update on this? Thanks

Nov 19 '23 16:11 bruno-garcia

bump?

Feb 15 '24 01:02 ericsampson

Hi @brianrob, I've updated this PR with the latest changes from the main branch and all the CI jobs pass. Is there anything I should do to so that these changes can eventually land?

Mar 20 '24 15:03 vaind

This looked fine to me, but I warn my knowledge of the TraceLog portion of TraceEvent is limited so I wouldn't read too much into it. If there were anything subtly wrong I don't expect I would catch it without spending a while researching the internals of TraceLog.

Thanks @noahfalk. No worries - I just wanted to get another set of eyes on this, and also to make you aware of the functionality.

Apr 03 '24 02:04 brianrob

wooo thanks @noahfalk !

Getting there

Apr 05 '24 00:04 bruno-garcia

@brianrob thanks for the review, somehow I've missed it until now.

I've addressed your requests and updated the branch. Let me know if there's anything else needed.

May 23 '24 10:05 vaind

perfview perfview copied to clipboard

TraceLog streaming/in-memory EventPipe support

perfview
perfview copied to clipboard