perfview Add support for collecting CLR event through EventPipe

Add -eventpipe option to activate the EventPipe collection
Add -providers provider:keyword:verbosity:tags,... option to allow fine grained tuning of dotnet-trace execution
Add -sdk-path to point to an already installed dotnet SDK (otherwise it will be install in /tmp/dotnet_sdk_tool)
When the collection is done, a trace.nettrace and eventpipe.log files will be part of the resulting zip file that perfview is now able to leverage

Oct 16 '20 08:10 chrisnas

Hi, I'd like to push this PR forward to be able to merge https://github.com/criteo-forks/perfview/commit/747f2a8fe29f2fdf8ad8a71544c73df35601f14a so that we stop relying on the perfview fork internally :)

I discussed with @chrisnas and @gleocadie, and it seems that some tests were missing. I also see that 1 test in the current build failed, but the build result is long gone. @brianrob could you please re-trigger the test if there's such an option?

May 20 '21 13:05 ezsilmar

/azp run

May 24 '21 15:05 brianrob

Azure Pipelines successfully started running 1 pipeline(s).

May 24 '21 15:05 azure-pipelines[bot]

Thanks @ezsilmar. Just triggered a new CI run.

May 24 '21 15:05 brianrob

Hello! @brianrob I got some time to come back to this PR and would be glad to get a code review.

I mainly fought test instability:

Disable test parallelization: this was already the case for most test projects
Better handling of shared directories in EtlTestBase
In perfcollect install for Ubuntu removed the packages that are missing, used linux-tools-generic instead
In container tests for dotnet-trace added a sleep after launching the test program

About the last point, something weird is happening. If I attach to the process with dotnet-trace right after the process is started, dotnet-trace hangs forever printing Stopping the trace. This may take up to minutes depending on the application being traced. If I wait for about a couple of seconds it works fine. This behavior reproduces in the github build pipeline, so there's probably a bug in dotnet-trace.

Sep 17 '21 12:09 ezsilmar

The test failing currently is OOM of CanReadV4EventPipeTraceBiggerThan4GB, it passes on my machine.

Sep 17 '21 14:09 ezsilmar

Hello, this PR is hanging for almost 3 years but it is still relevant in our context, and I think it'd be beneficial for the community as well.

To remind what this is all about, we often use PerfView on Windows to analyze the behavior of dotnet apps running on Linux. Relevant to this PR, PerfView can understand:

perf CPU samples, collected with perfcollect
lttng text data file, collected with perfcollect
nettrace file, collected with dotnet-trace

In the days of net2, using perf+lttng was the only way. Perfcollect greatly eased the process by combining events and cpu samples into a single .trace.zip file. Later, dotnet-trace became a thing making perfcollect almost abandoned (at least that's my feeling). While for the events dotnet-trace is much more convenient than lttng, it was never intended to match capabilities of perf. Thus today when we need both cpu sampling and events we deal with two separate artifacts: a perfcollect output and a nettrace file.

This PR is a quality of life change that allows the nettrace file to be packaged in .trace.zip, alongside perf data. A nice side-effect is we can zip .nettrace file which is important for sharing long sessions. The PR also modifies perfcollect to be able to use dotnet-trace under the hood, however this part is not important for my particular usecase as we run perf and zip directly in our troubleshooting code.

If modifying perfcollect is not something you'd like to support, we could just merge the change in PerfViewData.cs that tries to read .nettrace from .trace.zip: it's small and beneficial on its own. Wdyt?

Mentioning @brianrob as the last reviewer

Feb 03 '23 16:02 ezsilmar