sentry-dotnet icon indicating copy to clipboard operation
sentry-dotnet copied to clipboard

Support for profiling

Open bruno-garcia opened this issue 3 years ago • 1 comments

Possibly through the same apis used by dotnet trace

bruno-garcia avatar Sep 28 '22 21:09 bruno-garcia

https://docs.sentry.io/product/profiling/

mattjohnsonpint avatar Sep 28 '22 22:09 mattjohnsonpint

This would be helpful for us

cove avatar Feb 02 '23 18:02 cove

Came in on Discord:

Hi everyone - new user here 🙂 - Is there a recommendation for profiling on dotnet? We used the older version for a while and now with our signup noticed that dotnet isn't offered. Sorry if this is addressed elsewhere

bruno-garcia avatar Feb 14 '23 20:02 bruno-garcia

After making a small change to the dotnet-trace command-line app to connect to its own process (ignore the --process-id arg and use Process.GetCurrentProcess().Id instead), it seems to be perfectly happy about profiling itself. See the attached profiles and a screenshot from speedscope

dotnet-trace.exe_20230216_122438.zip

image

With the dotnet-trace code being licensed under MIT, it seems like a good candidate for cherry-picking an in-process profiling implementation that could be part of the Sentry SDK. We would likely need a way to filter-out profiling-related events so they don't confuse people?

The CPU usage of the whole dotnet-trace executable as reported by the process monitor on my PC was reporting between 0.0 and 0.2 % - since it wasn't actually doing anything else than collecting the trace, it should represent the actual usage of sampler collection (and writing to file).

image

vaind avatar Feb 16 '23 11:02 vaind

Status update:

I have rolled the nettrace processing directly in a fork of dotnet-trace (see the current working version here: https://github.com/vaind/diagnostics/tree/sentry-profiling) and while I am producing a JSON which I hope is correct, but I haven't had luck getting it to sentry.io yet. Likely the issue is with how I'm pushing the envelope with the profile manually (through sentry-CLI) and it doesn't get associated with the transaction. Or it's just invalid - can't tell at the moment because I don't see what's happening on the server :/

profile.zip

vaind avatar Feb 24 '23 13:02 vaind

Two PRs need to get in in order to accept your profile:

  • Add donet as a platform: https://github.com/getsentry/relay/pull/1885
  • Deprecation of some fields: https://github.com/getsentry/relay/pull/1878

Looking at your profile, a few things I saw you'd need to correct:

  • timestamp needs to be RFC3339 formatted, like 2023-03-01T10:10:10.123456789+06:00 (the more precision the better, down to the nanosecond if possible).
  • os.build should be os.build_number and os.build_number should be os.version
  • transaction.id needs to be a uuid4 without -
  • there's no field id in thread_metadata values

phacops avatar Feb 24 '23 23:02 phacops

Also, at what rate are you sampling? We use 101Hz in our other SDKs.

phacops avatar Feb 24 '23 23:02 phacops

Note to self:

The raw addresses have to be associated with some entity that knows its symbolic name. At this point things work very differently for native code and code that is JIT compiled on the fly:

  • For native code, TraceEvent must find the DLL that includes the code, for this it needs information about all DLLs that where loaded in the process and what addresses they are loaded at. These are the kernel ImageLoad events.
  • For JIT compiled code it needs to know the code ranges of all JIT compiled methods. For this it needs special .NET or Jscript events specifically designed for this purpose.

If the necessary events are not present, the best that can be done is to show the address value as a hexadecimal number (which is not very helpful). Thus it is critical that these events be present. Complicating this is the fact that in many scenario of long running processes. If the process lives longer than the collection interval, then there can be image loads or JIT compilation that occurred before the trace started. We need these events as well. To get them the ETW providers involved support something called 'CAPTURE_STATE' which causes them to emit events for all past image loads or JIT compilations. The logic for capturing data must explicitly include logic for triggering this CAPTURE_STATE.

vaind avatar Feb 27 '23 07:02 vaind

Both PRs have been merged and deployed so no more blocker on our side.

phacops avatar Feb 28 '23 18:02 phacops

Closing this through #2206

Follow ups are: #2315 and #2316

bruno-garcia avatar Apr 20 '23 15:04 bruno-garcia