perfview icon indicating copy to clipboard operation
perfview copied to clipboard

perfcollect produces dump with empty lttng folder

Open andredasilvapinto opened this issue 2 years ago • 10 comments

perfcollect collect sample-threadtime -collectsec 10 -threadtime
Collection started. Collection will automatically stop in 10 second(s). Press CTRL+C to stop early.

...STOPPED.

Starting post-processing. This may take some time.

Generating native image symbol files
...FINISHED
Saving native symbols
...FINISHED
Resolving JIT and R2R symbols
...FINISHED
Exporting perf.data file
...FINISHED
Compressing trace files
...FINISHED
Cleaning up artifacts
...FINISHED

Trace saved to sample-threadtime.trace.zip

This produced a 1.1 GB zip file. If I open it in PerfView I only get a CallStack option (I was expecting off cpu events too, but not sure how they would be presented). If I extract the zip file, the folder lttngTrace\auto-20220902-145202\ust\uid\0\64-bit is empty.

Ubuntu 18.04.6 perf version 4.15.18 lttng (LTTng Trace Control) 2.10.2 - KeKriek dotnet 6.0.201 PerfView 3.03

andredasilvapinto avatar Sep 02 '22 15:09 andredasilvapinto

LTTng output is controlled by environment variable. Can you confirm that you set COMPlus_EnableEventLog=1 for the dotnet process?

brianrob avatar Sep 02 '22 22:09 brianrob

I have set DOTNET_PerfMapEnabled=1 and DOTNET_EnableEventLog=1 for the container as per https://docs.microsoft.com/en-us/dotnet/core/diagnostics/trace-perfcollect-lttng#collect-a-trace I have also added the SYS_ADMIN capability.

Do I also need COMPlus_EnableEventLog=1?

andredasilvapinto avatar Sep 06 '22 09:09 andredasilvapinto

Adding COMPlus_EnableEventLog=1 makes no difference.

andredasilvapinto avatar Sep 06 '22 15:09 andredasilvapinto

Environment variables that start with DOTNET_ are treated the same as those that start with COMPlus_. COMPlus_ is the original/historical prefix. Given that the trace is that large, I suspect that data was captured by perf, but not by LTTng. Can you please unzip the trace and share perfcollect.log? This might help us to understand what's happening.

brianrob avatar Sep 06 '22 17:09 brianrob

I have attached the perfcollect.log file of the latest collection I did yesterday. perfcollect.log

andredasilvapinto avatar Sep 07 '22 10:09 andredasilvapinto

The log shows that all of the LTTng commands failed, because a channel couldn't be created. Can you please try running the following:

lttng create
lttng add-context --userspace --type vpid

I don't see anything in the log for lttng create, but would like to understand if this failed. FWIW, LTTng userspace should work within containers.

brianrob avatar Sep 07 '22 17:09 brianrob

# lttng create
Spawning a session daemon
Session auto-20220908-131306 created.
Traces will be written in /root/lttng-traces/auto-20220908-131306
# lttng add-context --userspace --type vpid
Error: vpid: UST create channel failed
Warning: Some command(s) went wrong

andredasilvapinto avatar Sep 08 '22 13:09 andredasilvapinto

Given that these commands don't work outside of perfcollect, it doesn't sound like a bug in perfcollect. Since this is inside a container, is it possible for you to try this in a container whose seccomp profile is set to unconstrained? Perhaps there is another permission that is required that the container doesn't have?

If not, there is an old LTTng bug report that might help - it's not exactly the same, but could be related: https://bugs.lttng.org/issues/1078#:~:text=%20lttng%20enable-event%20throws%20the%20error%20%22UST%20create,session%2C%20which%20is%20kept%20for%20later.%20More%20.

brianrob avatar Sep 08 '22 17:09 brianrob

I tried running a different container on the same host with the same configuration except for the docker image (I used the base image of the container with the problem) and the lttng commands work without the need for a different seccomp profile, so it seems it might be related to something specific to that docker image / application. No idea what can be causing this though.

andredasilvapinto avatar Sep 09 '22 11:09 andredasilvapinto

It does sound like that. You may need to attach a debugger to the failing LTTng command to get more details.

brianrob avatar Sep 09 '22 22:09 brianrob