perfview icon indicating copy to clipboard operation
perfview copied to clipboard

perfcollect pid filter produces empty dump in PerfView

Open andredasilvapinto opened this issue 2 years ago • 9 comments

# perfcollect collect sample-pid-threadtime -collectsec 10 -threadtime -pid 1
Collection started. Collection will automatically stop in 10 second(s). Press CTRL+C to stop early.

...STOPPED.

Starting post-processing. This may take some time.

Warning:
Processed 2682678 events and lost 89 chunks!

Check IO/CPU overload!

Generating native image symbol files
...FINISHED
Saving native symbols
Warning:
Processed 2682678 events and lost 89 chunks!

Check IO/CPU overload!

...FINISHED
Resolving JIT and R2R symbols
...FINISHED
Exporting perf.data file
...FINISHED
Compressing trace files
...FINISHED
Cleaning up artifacts
...FINISHED

Trace saved to sample-pid-threadtime.trace.zip

This produced a 173 MB zip file, but opening up in PerfView only shows the CPU Stack option with no content (only ROOT entry in By Name)

Ubuntu 18.04.6 perf version 4.15.18 lttng (LTTng Trace Control) 2.10.2 - KeKriek dotnet 6.0.201 PerfView 3.03

andredasilvapinto avatar Sep 02 '22 15:09 andredasilvapinto

Since the PID is 1, I'm wondering if you're running this inside a container. That may have something to do with this. Can you tell me more about the environment? I also suspect that -pid and -threadtime aren't going to work well together since you are going to need sched_switch events from other processes to get a complete picture.

brianrob avatar Sep 02 '22 22:09 brianrob

Yes, it is inside a container built on top of ubuntu bionic.

andredasilvapinto avatar Sep 06 '22 13:09 andredasilvapinto

This text means that there were lost events during the trace, which does make the capture suspect:

Warning:
Processed 2682678 events and lost 89 chunks!

Check IO/CPU overload!

That said, I would have expected to see something in the trace, unless this resulted in some corruption. Are you able to share the trace file?

brianrob avatar Sep 06 '22 17:09 brianrob

This was done in a production container so I unfortunately cannot share the trace file but I will try to reproduce the issue in a generic container and get back to you.

andredasilvapinto avatar Sep 08 '22 13:09 andredasilvapinto

I noticed an error in perfcollect.log on the dumps with -pid that is not present in the ones without -pid

Running /usr/bin/perf script -i perf-jit.data -F comm,pid,tid,cpu,time,period,event,ip,sym,dso,trace > perf.data.txt
'trace' not valid for hardware events. Ignoring.
'trace' not valid for software events. Ignoring.
'trace' not valid for unknown events. Ignoring.
'trace' not valid for unknown events. Ignoring.
'trace' not valid for unknown events. Ignoring.
Samples for 'cpu-clock' event do not have CPU attribute set. Cannot print 'cpu' field.

Running /usr/bin/perf script -i perf-jit.data -f comm,pid,tid,cpu,time,event,ip,sym,dso,trace > perf.data.txt
  Error: Couldn't find script `comm,pid,tid,cpu,time,event,ip,sym,dso,trace'

Could this be the cause for the empty results in PerfView?

andredasilvapinto avatar Sep 08 '22 15:09 andredasilvapinto

Probably related to https://github.com/dotnet/corefx-tools/issues/92 and https://github.com/dotnet/corefx-tools/issues/84

andredasilvapinto avatar Sep 08 '22 15:09 andredasilvapinto

Yes, it could. Can you try capturing with this private copy of perfcollect and see if that solves the problem?

https://raw.githubusercontent.com/microsoft/perfview/04663ba0aa85c90e832f1e5a2ecb2fef972f2796/src/perfcollect/perfcollect

brianrob avatar Sep 08 '22 17:09 brianrob

For the prod process that I was testing, on this new version it hangs in Resolving JIT and R2R symbols.

I tried in another container with just a kind of hello world process and it still fails but now the log shows two error messages

First with cpu

Running /usr/bin/perf script -i perf-jit.data -f comm,pid,tid,cpu,time,event,ip,sym,dso,trace > perf.data.txt
  Error: Couldn't find script `comm,pid,tid,cpu,time,event,ip,sym,dso,trace'

 See perf script -l for available scripts.

and then without cpu

Running /usr/bin/perf script -i perf-jit.data -f comm,pid,tid,time,event,ip,sym,dso,trace > perf.data.txt
  Error: Couldn't find script `comm,pid,tid,time,event,ip,sym,dso,trace'

 See perf script -l for available scripts.

andredasilvapinto avatar Sep 09 '22 14:09 andredasilvapinto

I suspect that these are two different issues. The hang is likely just the fact that there is a lot of code being used in the app. The more managed code that you use, the larger the "injection" phase will take, which is what occurs during Resolving JIT and R2R Symbols. The failures that you're seeing are due to a having to try different command lines. Are you still seeing nothing when you open the CPU stacks in PerfView? If so, would it be possible for you to share the dockerfile that describes the setup that you're using for the hello world app? I can try it out for myself and debug as necessary.

brianrob avatar Sep 09 '22 22:09 brianrob