semantic-conventions
                                
                                
                                
                                    semantic-conventions copied to clipboard
                            
                            
                            
                        Add initial experimental .NET CLR runtime metrics
Fixes #956
Changes
Adds proposed experimental .NET CLR runtime metrics to the semantic conventions. Based on discussions with the .NET runtime team, the implementation plan will be to port the existing metrics from OpenTelemetry.Instrumentation.Runtime as directly as possible into the runtime itself. The names have been modified to align with the runtime environment metrics conventions.
Merge requirement checklist
- [x] CONTRIBUTING.md guidelines followed.
 - [x] Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with 
[chore] 
 - If your PR does not need a change log, start the PR title with 
 - [x] ~schema-next.yaml updated with changes to existing conventions.~
 
I'm not entirely sure why markdownlint fails on the autogenerated tables.
Have you run both attribute-registry-generation and table-generation?
Have you run both
attribute-registry-generationandtable-generation?
I thought I had done so as the contents were updated, but I can try those again. I'm having to hack around the tooling a bit to get things running on Windows.
@trisch-me I've updated the formatting per your suggestion and rerun both of those make targets. Neither made any changes to the markdown files.
Alternatively you could wait until #1000 will be merged, it has fix for this bug
@stevejgordon The #1000 is merged so please re-run code generation locally and update your files. Thanks
Thanks, @trisch-me. Looks good!
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: stevejgordon / name: Steve Gordon (54833c468b85891622406d22108eff404ad8adbe, cdbef6a50a6442c53dff59ec42281ef5e60934be, 630ee50a381b41c38566c4eb86f1b015d4edcffe, 1a274040ac202430bb6477de7179c762b027495a, a4ad8952b6db4ab52974340c571a4d5080bcadb1, 6b0a6e942d5cfd962a1d49e1e75c1e3c21829ab6, 1aefc2939fc234939055c446af7151916920e2ec, dc63af3cb86e01462e5766a3165572f46a15aadc, a5da35ca68ae9c8d022ad2d55b74cb863f9e541a, 5d63a8f536b903c01554d6b5248809afb6495acf, c80a739bf579b754bba8e84b3c4c4a14bd89730c, 8717a4007e9781af7fcff748b53c138894d5f001, 3d14868285b7ed7efd72f288293ad966fab78103, b4b98c08d3ec10cd4f86ac837088373dc894c40a, d08792a53295d387a6bedfc4f428f0bbf4927180, aa4eb335e094965f7d2ee7b54a7a91225222ea08, c48510fc7a10b3d0d15011224a30914f5ceb79d8, 4ae73ed3fec0f1b53dae188141acd68e945f4ba6, 88fd58c4985f5ea83964a93a6369d2ad2afa7fc3, 72c16570cddd5daf0c455d7afae9ff9fe5dc4da4, fe47a3123adb5b04154c5d3eb070bb5211a67e1c, 90a9356bbc8dea9cfc23285578f8b9cadfe8c98f, f8866a1e446d0201c033e6b973bdeafc2b430f66, 866a85cc68b249e0efc4f945cd9cd2dfeba2537f, 8f95fd10110aea0ec9b7c13bd32ad6ea20f3b282, c6b3628e47ff8ef67ab1d3f46ca021b22e5a0de3, 6b7a0ed35e88a047b85e727b252a0a267ad68de6, 8cc317588b1f10be8429d6c524d09f1ca0002772, 8e90ea0647703cb54f79b8a669c14ae906b32992, f1dd3dd8a3f0e84e9137994fb14ec3439d17aa5e, eab22b20ab57c0ca0bfffaebcaf2f08b15eabca3, 085e33196ed1351ba5bd2de9598f3228dad3386e, fbd75f25d692155fc707796e5e3b01c069a7cf6e)
 - :white_check_mark: login: lmolkova / name: Liudmila Molkova (d104216ab578424673a4ee4b1a6acaaf199a236b, 47ed007e1b29ba92c02b1b893d9ad61b14c682fc, 6dcfd10525f3bc316e4a5b7b3cbb90455127e9fe)
 
/easycla
Looking at the discussions in this PR, I want to reiterate the proposal https://github.com/open-telemetry/semantic-conventions/pull/1035#pullrequestreview-2077532984:
Let's think about user experience - which metrics users want to see first - my assumption is CPU, memory (from all sources, ideally in one/few metrics, GC, maybe threads. Let try to come up with a few metrics that would not require users to have a deep prior expertise in .NET memory management or know subtle differences between different .NET flavors.
If we need some advanced, precise things, they should come as an addition to basic CPU/mem/GC things. Let's focus on basics first.
If we need some advanced, precise things, they should come as an addition to basic CPU/mem/GC things. Let's focus on basics first.
Are you suggesting lets review some things first and some things later? Or do you mean "Lets see if we can avoid including some of these metrics in .NET 9 at all?" If its just review ordering I'd say no problem. If the goal is to have these metrics not contain all the metrics in the current OTel .NET runtime instrumentation I think that leads to trouble. One of the major scenarios will be people migrating from using older metrics to these metrics and every metric we remove makes that migration harder or discourages them from migrating at all. If we want to have a simple set of metrics for folks just getting started I think a good approach to that will be docs or a pre-made dashboard, not by excluding metrics from the underlying instrumentation.
EDIT: Just to add I know a few of the metrics may feel a bit advanced or niche, but they are there because customer feedback wanted them to be there. We took a bunch of things out during the move from Windows Performance Counters -> EventCounters, but customers gave us feedback that we had cut too deep and please restore some of the metrics that were important to them.
on reducing the scope:
@noahfalk thanks for the explanation. If it's either all or nothing, I believe we should still start from the basics (and try to do everything in .NET 9).
The main concerns on the basics from my side are:
- CPU - https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1626774736 - seems trivial, just needs to be defined.
 - Memory: https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1613860662 and https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1626780003
 
The rest seems more cosmetic based on our discussions on duration/histograms in the https://github.com/dotnet/runtime/issues/85372#issuecomment-2138800897
The main concerns on the basics from my side are...
Thanks @lmolkova, much appreciated! I'm just waiting to make sure I've got the right idea about the cpu portion of the discussion and then I'll respond back on the memory part as well.
@noahfalk / @lmolkova - I've returned to the office and updated some core changes discussed above.
- clr.* is now dotnet.*
 - filenames are updated to align with the clr > dotnet change
 - I've added a 
dotnet.exception.typeattribute to the registry - I've reintroduced details in 
dotnet-metrics.mdabout the equivalent API used to collect the metric value. 
@trisch-me One issue, if we stick with the updated file naming, is docs/attributes-registry/dotnet.md where ideally, the title would be .NET rather than Dotnet. Is there a way to configure that?
@stevejgordon not in current implementation but as soon as this issue is implemented, which should be soon, you will be able to add to the dotnet group a title as .Net
@noahfalk / @lmolkova - I've returned to the office and updated some core changes discussed above.
Thanks and welcome back! I think there were two other areas where change has been proposed and not yet captured in updated docs:
- Add a dotnet.cpu.time metric to provide CPU usage info: https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1634044798
 - Add either a dotnet.heap.used or dotnet.memory.used: https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1634044832 This memory one seemed the roughest at the moment. As best I can tell dotnet.memory.used is a super-set of dotnet.heap.used so perhaps that is the only one we'd need? Presumably adding either of these metrics would effectively replace/rename data that is currently exposed in dotnet.gc.committed_memory.size and dotnet.gc.heap.size. I don't think it was a replacement for any of the other gc size-related metrics.
 
@lmolkova - I'm still interested to get your feedback on those memory related metrics trying to refine what would be there :)
Thanks!
- Add a dotnet.cpu.time metric to provide CPU usage info: Add initial experimental .NET CLR runtime metrics #1035 (comment)
 
Yep, I still need to add dotnet.cpu.time and dotnet.cpu.count. The first depends on #1026 being merged, so I've waited for now.
Thanks @stevejgordon I think we're getting pretty close!
Regarding CPU - the https://github.com/open-telemetry/semantic-conventions/pull/1026 is merged. But it'd be totally fine to send CPU as a separate PR.
No problem, @lmolkova. Thanks again for the feedback. I've updated per your suggestions and also snuck in the new CPU metrics.
@lmolkova I'm thinking dotnet.thread_pool.queue.length unit should be {work_item} given what the queue length represents. Would you concur?
@lmolkova I'm thinking
dotnet.thread_pool.queue.lengthunit should be{work_item}given what the queue length represents. Would you concur?
👍
I've opened a PR for the implementation at https://github.com/dotnet/runtime/pull/104680. It's a draft for now while we finalise the last naming discussion. We'd need to get this approved quite soon to meet the .NET 9 deadline if any changes are needed here.
I'm declaring comment bankruptcy and will try to list active discussions here:
- [x] codeowners (trivial) - https://github.com/open-telemetry/semantic-conventions/pull/1035/files#r1672592867
 - [x] exceptions.count -> exceptions (trivial) https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672597984
 - [x] heap.size -> last_collection_size (some discussion) https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672029675
 - [x] memory overall (big discussion) https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672057241
- [x] remove 
gc.heap.objects.size- https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672006570 
 - [x] remove 
 
(might add more if I see them)
[Update] based  on the offline discussion
Discussed offline with @noahfalk @tarekgh @samsp-msft @reyang
Suggesting the following changes:
- heap renames (https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672029675)
- [x] rename 
dotnet.gc.memory.total_allocated->dotnet.gc.heap.total_allocated - [x] rename 
dotnet.gc.memory.committed->dotnet.gc.last_collection.memory.committed_size - [x] rename 
dotnet.gc.heap.size->dotnet.gc.last_collection.heap.size - [x] rename 
dotnet.gc.heap.fragmentation->dotnet.gc.last_collection.heap.fragmentation.size 
 - [x] rename 
 - [x] remove 
dotnet.gc.heap.objects.size(https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672006570) - [x] add 
dotnet.process.memory.working_set(https://github.com/open-telemetry/semantic-conventions/pull/1035#discussion_r1672072345) - cpu
- [x] rename 
dotnet.cpu.time->dotnet.process.cpu.time - [x] rename 
dotnet.cpu.count->dotnet.process.cpu.count- [follow up in semconv and/or in .NET docs, not blocking this PR] we'll need to document it well since it's not perfect
- how docker/k8s limits are translated (rounding up)
 - is it dynamic?
 - describe how to use (e.g. document how to calculate cpu utilization from 
timeandcount) 
 
 - [follow up in semconv and/or in .NET docs, not blocking this PR] we'll need to document it well since it's not perfect
 
 - [x] rename 
 - [x] please also document meter name(s)
 
Suggesting the following changes:
Thanks! I'm also checking with Maoni offline to ensure she doesn't think anything in our new naming is problematic, but given past discussions with her I think she is going to be happy.
Thanks! I'm also checking with Maoni offline to ensure she doesn't think anything in our new naming is problematic, but given past discussions with her I think she is going to be happy.
Yep, all good with the names as proposed.
Thanks, @lmolkova. I've pushed up all the suggested changes.
Given that we have an implementation ready to go, should we consider adding these as stable?
Given that we have an implementation ready to go, should we consider adding these as stable?
Let's add them as experimental for now, we can mark them as stable when .NET 9 is GA-ed.