semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

Design a `process.status` metric

Open braydonk opened this issue 1 year ago • 14 comments
trafficstars

Area(s)

area:system

Is your change request related to a problem? Please describe.

In https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33731 we received a proposal to add process.status as a resource attribute. Due to the changing nature of the process status within a process run, I believe it does not make sense as part of a process resource attribute. I'd like to engage the System Semantic Conventions working group to design it instead as a metric.

Describe the solution you'd like

I am not sure. :slightly_smiling_face:

Process Status is generally a string result, and on Linux at least is a restricted set of options: running, uninterruptible sleep, interruptable sleep, stopped, zombie. Metric design isn't my strong suit, so I would like to hear from someone with a good idea for this, or if anyone is able to point to a similar enum-style metric.

Describe alternatives you've considered

No response

Additional context

No response

braydonk avatar Jun 25 '24 20:06 braydonk

cc @open-telemetry/semconv-system-approvers

braydonk avatar Jun 25 '24 20:06 braydonk

I do see system.process.status in the semantic conventions. https://github.com/open-telemetry/semantic-conventions/blob/v1.26.0/docs/attributes-registry/system.md#system-process-attributes

ishleenk17 avatar Jun 26 '24 07:06 ishleenk17

Thank's @braydonk!

Regarding the immutability concern: we are facing this in several other places like at https://github.com/open-telemetry/semantic-conventions/pull/997#issuecomment-2129538490 and https://github.com/open-telemetry/semantic-conventions/issues/1160#issuecomment-2176190366. This seems to be another one.

Regarding the process.status, do you propose another attribute different than https://github.com/open-telemetry/semantic-conventions/blob/v1.26.0/docs/attributes-registry/system.md#system-process-attributes which @ishleenk17 also mentioned?

ChrsMark avatar Jun 26 '24 09:06 ChrsMark

@tigrannajaryan This is another case for the need of mutable resource attributes.

Is there already some central place where the Entity WG is tracking these cases? If not would it make sense to have single issue to reference them all?

AlexanderWert avatar Jun 26 '24 11:06 AlexanderWert

Is there already some central place where the Entity WG is tracking these cases? If not would it make sense to have single issue to reference them all?

This repository is the best place to track them. Entity SIG is not currently working on specific cases. We are working on generic data model. The work on specific cases for concrete entity types will happen in this repo.

tigrannajaryan avatar Jun 26 '24 13:06 tigrannajaryan

The system.process.status attribute is used for the overall system process count metrics. It does cover the same concept as what we'd want to see here, however way you'd use it for an individual process couldn't be as a resource attribute due to the mutability issue. So in that case I am not sure where to use it.

braydonk avatar Jun 26 '24 14:06 braydonk

Discussed in System Semantic Conventions meeting June 27. The only effective way to use process.status under the current data model is as a resource attribute, despite the mutability problem.

Recording the decisions:

  • I'm going to accept process.status as a resource attribute upstream in the hostmetricsreceiver.
  • I think system.process.status in the current semantic conventions spec should move to process.status to live with the other process attributes. I will add it as a resource attribute in semantic conventions as well after that's done.
  • When Entity data model starts to take shape, this will be adapted to that instead.

I will use this issue to attach to moving system.process.status, and I can either keep this issue open or create a new one to track how process is going to adapt to the upcoming Entity data model.

braydonk avatar Jun 27 '24 15:06 braydonk

  • I'm going to accept process.status as a resource attribute upstream in the hostmetricsreceiver.

If we have decided to proceed with it, can we proceed with this PR: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33731 ?

ishleenk17 avatar Jun 28 '24 05:06 ishleenk17

Yes that is the plan. I was going to finish my PR to semantic conventions to add it as a resource attribute so we can follow it in that PR. I will give it a review once I have done that.

On Fri, Jun 28, 2024, 1:10 AM Ishleen Kaur @.***> wrote:

  • I'm going to accept process.status as a resource attribute upstream in the hostmetricsreceiver.

If we have decided to proceed with it, can we proceed with this PR: open-telemetry/opentelemetry-collector-contrib#33731 https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33731 ?

— Reply to this email directly, view it on GitHub https://github.com/open-telemetry/semantic-conventions/issues/1181#issuecomment-2196154477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWJXJSECKE3MDWP4ZTRNZNTZJTV3HAVCNFSM6AAAAABJ4SBQAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJWGE2TINBXG4 . You are receiving this because you were mentioned.Message ID: @.***>

braydonk avatar Jun 28 '24 11:06 braydonk

Resource CANNOT change over the lifetime. A few reasons:

  • The SDK specified resources are immutable which we rely on for many features.
  • Metrics identity is tied to resource. If a resource attribute changes, you have disjoint metrics/timeseries.
  • Resource attributes are used as identifying attributes for bundling purposes. Again, mutable attributes (or super high cardinality ones) can cause issues in the collector's handling/joining of data and on backends interpreting the data.
  • Mutable resource attributes are being considered for instances where the identity of the resource can change during lifetime of the SDK. That does not mean a process changing status, but it DOES mean a browser creating a new client session, where want the resource to be the client session, not the browser.

Entities is a WIP and does not exist yet. The goal of this group is find a way to allow descriptive attributes of resource which can change to be described/reported. The discussion is still TBD - but it is NOT guaranteed (or likely in my opinion) that descriptive attributes, particularly problematic ones like process.status would end up in resource, for all the reasons mentioned above.

Having process status in the registry (https://github.com/open-telemetry/semantic-conventions/blob/v1.26.0/docs/attributes-registry/system.md#system-process-attributes) does NOT mean it is for resource. This just means we have a consistent name for it regardless of where it shows up.

I am against adding process.status to Resource. This would break so many requirements of the opentelemetry data model.

I would also reject this request in the collector.

jsuereth avatar Jul 15 '24 18:07 jsuereth

Thank's for the feedback @jsuereth! I'm a bit confused though with what is the generic guidance here. Reading https://github.com/open-telemetry/semantic-conventions/issues/1160#issuecomment-2217998698 regarding the k8s.pod.ip attribute, seems to be in contrast with your comment 🤔? Would that be possible to clarify this mutable VS immutable topic horizontally for SemConv to ensure that we don't accidentally violate any ruling in any of the ongoing work-streams?

ChrsMark avatar Jul 16 '24 10:07 ChrsMark

yes - Working on that, going to discuss with TC to make sure we have alignment, but for now let's stick to the guidance from this thread.

jsuereth avatar Jul 16 '24 15:07 jsuereth

Does host.ip fulfill this guidance?

mx-psi avatar Jul 16 '24 15:07 mx-psi

host.ip is an interesting conundrum. I'd say in many cases, it's stable enough to be acceptable in resource. In instances where it's unstable, and unstable frequently, we should not be using it and prefer a more stable id. I expect decisions around that to fall out in Entity WG work - Specifically around allowing Entities to have identifying and descriptive attributes. We're still working on core modelling.

jsuereth avatar Jul 16 '24 15:07 jsuereth

The guidance for how to design status metrics exists now: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/how-to-write-conventions/status-metrics.md

I'll submit a PR to add process.status.

braydonk avatar Oct 27 '25 13:10 braydonk

Moved that from blocked since it's not any more.

ChrsMark avatar Oct 29 '25 10:10 ChrsMark