semantic-conventions
semantic-conventions copied to clipboard
Design a `process.status` metric
Area(s)
area:system
Is your change request related to a problem? Please describe.
In https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33731 we received a proposal to add process.status as a resource attribute. Due to the changing nature of the process status within a process run, I believe it does not make sense as part of a process resource attribute. I'd like to engage the System Semantic Conventions working group to design it instead as a metric.
Describe the solution you'd like
I am not sure. :slightly_smiling_face:
Process Status is generally a string result, and on Linux at least is a restricted set of options: running, uninterruptible sleep, interruptable sleep, stopped, zombie. Metric design isn't my strong suit, so I would like to hear from someone with a good idea for this, or if anyone is able to point to a similar enum-style metric.
Describe alternatives you've considered
No response
Additional context
No response
cc @open-telemetry/semconv-system-approvers
I do see system.process.status in the semantic conventions. https://github.com/open-telemetry/semantic-conventions/blob/v1.26.0/docs/attributes-registry/system.md#system-process-attributes
Thank's @braydonk!
Regarding the immutability concern: we are facing this in several other places like at https://github.com/open-telemetry/semantic-conventions/pull/997#issuecomment-2129538490 and https://github.com/open-telemetry/semantic-conventions/issues/1160#issuecomment-2176190366. This seems to be another one.
Regarding the process.status, do you propose another attribute different than https://github.com/open-telemetry/semantic-conventions/blob/v1.26.0/docs/attributes-registry/system.md#system-process-attributes which @ishleenk17 also mentioned?
@tigrannajaryan This is another case for the need of mutable resource attributes.
Is there already some central place where the Entity WG is tracking these cases? If not would it make sense to have single issue to reference them all?
Is there already some central place where the Entity WG is tracking these cases? If not would it make sense to have single issue to reference them all?
This repository is the best place to track them. Entity SIG is not currently working on specific cases. We are working on generic data model. The work on specific cases for concrete entity types will happen in this repo.
The system.process.status attribute is used for the overall system process count metrics. It does cover the same concept as what we'd want to see here, however way you'd use it for an individual process couldn't be as a resource attribute due to the mutability issue. So in that case I am not sure where to use it.
Discussed in System Semantic Conventions meeting June 27. The only effective way to use process.status under the current data model is as a resource attribute, despite the mutability problem.
Recording the decisions:
- I'm going to accept
process.statusas a resource attribute upstream in the hostmetricsreceiver. - I think
system.process.statusin the current semantic conventions spec should move toprocess.statusto live with the other process attributes. I will add it as a resource attribute in semantic conventions as well after that's done. - When Entity data model starts to take shape, this will be adapted to that instead.
I will use this issue to attach to moving system.process.status, and I can either keep this issue open or create a new one to track how process is going to adapt to the upcoming Entity data model.
- I'm going to accept
process.statusas a resource attribute upstream in the hostmetricsreceiver.
If we have decided to proceed with it, can we proceed with this PR: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33731 ?
Yes that is the plan. I was going to finish my PR to semantic conventions to add it as a resource attribute so we can follow it in that PR. I will give it a review once I have done that.
On Fri, Jun 28, 2024, 1:10 AM Ishleen Kaur @.***> wrote:
- I'm going to accept process.status as a resource attribute upstream in the hostmetricsreceiver.
If we have decided to proceed with it, can we proceed with this PR: open-telemetry/opentelemetry-collector-contrib#33731 https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33731 ?
— Reply to this email directly, view it on GitHub https://github.com/open-telemetry/semantic-conventions/issues/1181#issuecomment-2196154477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWJXJSECKE3MDWP4ZTRNZNTZJTV3HAVCNFSM6AAAAABJ4SBQAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJWGE2TINBXG4 . You are receiving this because you were mentioned.Message ID: @.***>
Resource CANNOT change over the lifetime. A few reasons:
- The SDK specified resources are immutable which we rely on for many features.
- Metrics identity is tied to resource. If a resource attribute changes, you have disjoint metrics/timeseries.
- Resource attributes are used as identifying attributes for bundling purposes. Again, mutable attributes (or super high cardinality ones) can cause issues in the collector's handling/joining of data and on backends interpreting the data.
- Mutable resource attributes are being considered for instances where the identity of the resource can change during lifetime of the SDK. That does not mean a process changing status, but it DOES mean a browser creating a new client session, where want the resource to be the client session, not the browser.
Entities is a WIP and does not exist yet. The goal of this group is find a way to allow descriptive attributes of resource which can change to be described/reported. The discussion is still TBD - but it is NOT guaranteed (or likely in my opinion) that descriptive attributes, particularly problematic ones like process.status would end up in resource, for all the reasons mentioned above.
Having process status in the registry (https://github.com/open-telemetry/semantic-conventions/blob/v1.26.0/docs/attributes-registry/system.md#system-process-attributes) does NOT mean it is for resource. This just means we have a consistent name for it regardless of where it shows up.
I am against adding process.status to Resource. This would break so many requirements of the opentelemetry data model.
I would also reject this request in the collector.
Thank's for the feedback @jsuereth! I'm a bit confused though with what is the generic guidance here. Reading https://github.com/open-telemetry/semantic-conventions/issues/1160#issuecomment-2217998698 regarding the k8s.pod.ip attribute, seems to be in contrast with your comment 🤔? Would that be possible to clarify this mutable VS immutable topic horizontally for SemConv to ensure that we don't accidentally violate any ruling in any of the ongoing work-streams?
yes - Working on that, going to discuss with TC to make sure we have alignment, but for now let's stick to the guidance from this thread.
Does host.ip fulfill this guidance?
host.ip is an interesting conundrum. I'd say in many cases, it's stable enough to be acceptable in resource. In instances where it's unstable, and unstable frequently, we should not be using it and prefer a more stable id. I expect decisions around that to fall out in Entity WG work - Specifically around allowing Entities to have identifying and descriptive attributes. We're still working on core modelling.
The guidance for how to design status metrics exists now: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/how-to-write-conventions/status-metrics.md
I'll submit a PR to add process.status.
Moved that from blocked since it's not any more.