semantic-conventions Add initial experimental .NET CLR runtime metrics

Fixes #956

Changes

Adds proposed experimental .NET CLR runtime metrics to the semantic conventions. Based on discussions with the .NET runtime team, the implementation plan will be to port the existing metrics from OpenTelemetry.Instrumentation.Runtime as directly as possible into the runtime itself. The names have been modified to align with the runtime environment metrics conventions.

Merge requirement checklist

[x] CONTRIBUTING.md guidelines followed.
[x] Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
[x] ~schema-next.yaml updated with changes to existing conventions.~

May 14 '24 10:05 stevejgordon

I'm not entirely sure why markdownlint fails on the autogenerated tables.

May 14 '24 10:05 stevejgordon

Have you run both attribute-registry-generation and table-generation?

May 14 '24 14:05 trisch-me

Have you run both attribute-registry-generation and table-generation?

I thought I had done so as the contents were updated, but I can try those again. I'm having to hack around the tooling a bit to get things running on Windows.

May 14 '24 15:05 stevejgordon

@trisch-me I've updated the formatting per your suggestion and rerun both of those make targets. Neither made any changes to the markdown files.

May 14 '24 15:05 stevejgordon

Alternatively you could wait until #1000 will be merged, it has fix for this bug

May 15 '24 11:05 trisch-me

@stevejgordon The #1000 is merged so please re-run code generation locally and update your files. Thanks

May 15 '24 12:05 trisch-me

Thanks, @trisch-me. Looks good!

May 15 '24 13:05 stevejgordon

The committers listed above are authorized under a signed CLA.

:white_check_mark: login: stevejgordon / name: Steve Gordon (54833c468b85891622406d22108eff404ad8adbe, cdbef6a50a6442c53dff59ec42281ef5e60934be, 630ee50a381b41c38566c4eb86f1b015d4edcffe, 1a274040ac202430bb6477de7179c762b027495a, a4ad8952b6db4ab52974340c571a4d5080bcadb1, 6b0a6e942d5cfd962a1d49e1e75c1e3c21829ab6, 1aefc2939fc234939055c446af7151916920e2ec, dc63af3cb86e01462e5766a3165572f46a15aadc, a5da35ca68ae9c8d022ad2d55b74cb863f9e541a, 5d63a8f536b903c01554d6b5248809afb6495acf, c80a739bf579b754bba8e84b3c4c4a14bd89730c, 8717a4007e9781af7fcff748b53c138894d5f001, 3d14868285b7ed7efd72f288293ad966fab78103, b4b98c08d3ec10cd4f86ac837088373dc894c40a, d08792a53295d387a6bedfc4f428f0bbf4927180, aa4eb335e094965f7d2ee7b54a7a91225222ea08, c48510fc7a10b3d0d15011224a30914f5ceb79d8, 4ae73ed3fec0f1b53dae188141acd68e945f4ba6, 88fd58c4985f5ea83964a93a6369d2ad2afa7fc3, 72c16570cddd5daf0c455d7afae9ff9fe5dc4da4, fe47a3123adb5b04154c5d3eb070bb5211a67e1c, 90a9356bbc8dea9cfc23285578f8b9cadfe8c98f, f8866a1e446d0201c033e6b973bdeafc2b430f66, 866a85cc68b249e0efc4f945cd9cd2dfeba2537f, 8f95fd10110aea0ec9b7c13bd32ad6ea20f3b282, c6b3628e47ff8ef67ab1d3f46ca021b22e5a0de3, 6b7a0ed35e88a047b85e727b252a0a267ad68de6, 8cc317588b1f10be8429d6c524d09f1ca0002772, 8e90ea0647703cb54f79b8a669c14ae906b32992, f1dd3dd8a3f0e84e9137994fb14ec3439d17aa5e, eab22b20ab57c0ca0bfffaebcaf2f08b15eabca3, 085e33196ed1351ba5bd2de9598f3228dad3386e, fbd75f25d692155fc707796e5e3b01c069a7cf6e)
:white_check_mark: login: lmolkova / name: Liudmila Molkova (d104216ab578424673a4ee4b1a6acaaf199a236b, 47ed007e1b29ba92c02b1b893d9ad61b14c682fc, 6dcfd10525f3bc316e4a5b7b3cbb90455127e9fe)

May 17 '24 09:05 linux-foundation-easycla[bot]

/easycla

May 28 '24 18:05 trask

Looking at the discussions in this PR, I want to reiterate the proposal https://github.com/open-telemetry/semantic-conventions/pull/1035#pullrequestreview-2077532984:

Let's think about user experience - which metrics users want to see first - my assumption is CPU, memory (from all sources, ideally in one/few metrics, GC, maybe threads. Let try to come up with a few metrics that would not require users to have a deep prior expertise in .NET memory management or know subtle differences between different .NET flavors.

If we need some advanced, precise things, they should come as an addition to basic CPU/mem/GC things. Let's focus on basics first.

Jun 03 '24 18:06 lmolkova

If we need some advanced, precise things, they should come as an addition to basic CPU/mem/GC things. Let's focus on basics first.

Are you suggesting lets review some things first and some things later? Or do you mean "Lets see if we can avoid including some of these metrics in .NET 9 at all?" If its just review ordering I'd say no problem. If the goal is to have these metrics not contain all the metrics in the current OTel .NET runtime instrumentation I think that leads to trouble. One of the major scenarios will be people migrating from using older metrics to these metrics and every metric we remove makes that migration harder or discourages them from migrating at all. If we want to have a simple set of metrics for folks just getting started I think a good approach to that will be docs or a pre-made dashboard, not by excluding metrics from the underlying instrumentation.

EDIT: Just to add I know a few of the metrics may feel a bit advanced or niche, but they are there because customer feedback wanted them to be there. We took a bunch of things out during the move from Windows Performance Counters -> EventCounters, but customers gave us feedback that we had cut too deep and please restore some of the metrics that were important to them.

Jun 03 '24 23:06 noahfalk

on reducing the scope:

@noahfalk thanks for the explanation. If it's either all or nothing, I believe we should still start from the basics (and try to do everything in .NET 9).

The main concerns on the basics from my side are:

CPU - https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1626774736 - seems trivial, just needs to be defined.
Memory: https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1613860662 and https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1626780003

The rest seems more cosmetic based on our discussions on duration/histograms in the https://github.com/dotnet/runtime/issues/85372#issuecomment-2138800897

Jun 05 '24 01:06 lmolkova

The main concerns on the basics from my side are...

Thanks @lmolkova, much appreciated! I'm just waiting to make sure I've got the right idea about the cpu portion of the discussion and then I'll respond back on the memory part as well.

Jun 07 '24 06:06 noahfalk

@noahfalk / @lmolkova - I've returned to the office and updated some core changes discussed above.

clr.* is now dotnet.*
filenames are updated to align with the clr > dotnet change
I've added a dotnet.exception.type attribute to the registry
I've reintroduced details in dotnet-metrics.md about the equivalent API used to collect the metric value.

@trisch-me One issue, if we stick with the updated file naming, is docs/attributes-registry/dotnet.md where ideally, the title would be .NET rather than Dotnet. Is there a way to configure that?

Jun 18 '24 11:06 stevejgordon

@stevejgordon not in current implementation but as soon as this issue is implemented, which should be soon, you will be able to add to the dotnet group a title as .Net

Jun 18 '24 14:06 trisch-me

@noahfalk / @lmolkova - I've returned to the office and updated some core changes discussed above.

Thanks and welcome back! I think there were two other areas where change has been proposed and not yet captured in updated docs:

Add a dotnet.cpu.time metric to provide CPU usage info: https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1634044798
Add either a dotnet.heap.used or dotnet.memory.used: https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1634044832 This memory one seemed the roughest at the moment. As best I can tell dotnet.memory.used is a super-set of dotnet.heap.used so perhaps that is the only one we'd need? Presumably adding either of these metrics would effectively replace/rename data that is currently exposed in dotnet.gc.committed_memory.size and dotnet.gc.heap.size. I don't think it was a replacement for any of the other gc size-related metrics.

@lmolkova - I'm still interested to get your feedback on those memory related metrics trying to refine what would be there :)

Thanks!

Jun 19 '24 10:06 noahfalk

Add a dotnet.cpu.time metric to provide CPU usage info: Add initial experimental .NET CLR runtime metrics #1035 (comment)

Yep, I still need to add dotnet.cpu.time and dotnet.cpu.count. The first depends on #1026 being merged, so I've waited for now.

Jun 21 '24 11:06 stevejgordon

Thanks @stevejgordon I think we're getting pretty close!

Regarding CPU - the https://github.com/open-telemetry/semantic-conventions/pull/1026 is merged. But it'd be totally fine to send CPU as a separate PR.

Jun 28 '24 02:06 lmolkova

No problem, @lmolkova. Thanks again for the feedback. I've updated per your suggestions and also snuck in the new CPU metrics.

Jun 28 '24 09:06 stevejgordon

@lmolkova I'm thinking dotnet.thread_pool.queue.length unit should be {work_item} given what the queue length represents. Would you concur?

Jul 01 '24 15:07 stevejgordon

@lmolkova I'm thinking dotnet.thread_pool.queue.length unit should be {work_item} given what the queue length represents. Would you concur?

👍

Jul 01 '24 17:07 lmolkova

I've opened a PR for the implementation at https://github.com/dotnet/runtime/pull/104680. It's a draft for now while we finalise the last naming discussion. We'd need to get this approved quite soon to meet the .NET 9 deadline if any changes are needed here.

Jul 10 '24 15:07 stevejgordon

I'm declaring comment bankruptcy and will try to list active discussions here:

[x] codeowners (trivial) - https://github.com/open-telemetry/semantic-conventions/pull/1035/files#r1672592867
[x] exceptions.count -> exceptions (trivial) https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672597984
[x] heap.size -> last_collection_size (some discussion) https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672029675
[x] memory overall (big discussion) https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672057241
- [x] remove gc.heap.objects.size - https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672006570

(might add more if I see them)

Jul 10 '24 17:07 lmolkova

[Update] based on the offline discussion

Jul 10 '24 18:07 lmolkova

Discussed offline with @noahfalk @tarekgh @samsp-msft @reyang

Suggesting the following changes:

heap renames (https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672029675)
- [x] rename dotnet.gc.memory.total_allocated -> dotnet.gc.heap.total_allocated
- [x] rename dotnet.gc.memory.committed -> dotnet.gc.last_collection.memory.committed_size
- [x] rename dotnet.gc.heap.size -> dotnet.gc.last_collection.heap.size
- [x] rename dotnet.gc.heap.fragmentation -> dotnet.gc.last_collection.heap.fragmentation.size
[x] remove dotnet.gc.heap.objects.size (https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672006570)
[x] add dotnet.process.memory.working_set (https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672072345)
cpu
- [x] rename dotnet.cpu.time -> dotnet.process.cpu.time
- [x] rename dotnet.cpu.count -> dotnet.process.cpu.count
  - [follow up in semconv and/or in .NET docs, not blocking this PR] we'll need to document it well since it's not perfect
    - how docker/k8s limits are translated (rounding up)
    - is it dynamic?
    - describe how to use (e.g. document how to calculate cpu utilization from time and count)
[x] please also document meter name(s)

Jul 10 '24 21:07 lmolkova

Suggesting the following changes:

Thanks! I'm also checking with Maoni offline to ensure she doesn't think anything in our new naming is problematic, but given past discussions with her I think she is going to be happy.

Jul 10 '24 22:07 noahfalk

Thanks! I'm also checking with Maoni offline to ensure she doesn't think anything in our new naming is problematic, but given past discussions with her I think she is going to be happy.

Yep, all good with the names as proposed.

Jul 11 '24 06:07 noahfalk

Thanks, @lmolkova. I've pushed up all the suggested changes.

Jul 11 '24 07:07 stevejgordon

Given that we have an implementation ready to go, should we consider adding these as stable?

Jul 11 '24 07:07 stevejgordon

Given that we have an implementation ready to go, should we consider adding these as stable?

Let's add them as experimental for now, we can mark them as stable when .NET 9 is GA-ed.

Jul 11 '24 22:07 lmolkova

semantic-conventions semantic-conventions copied to clipboard

Add initial experimental .NET CLR runtime metrics

Changes

Merge requirement checklist

semantic-conventions
semantic-conventions copied to clipboard