semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

Add system uptime metric

Open andrzej-stencel opened this issue 3 years ago • 12 comments

What are you trying to achieve?

I want to add a metric to the semantic conventions that will describe the system uptime. How about system.uptime?

Additional context.

This is reported by Telegraf as uptime field of the system metric (in seconds).

Here's a related proposal on the hostmetrics receiver to add this metric: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/14130.

andrzej-stencel avatar Sep 20 '22 12:09 andrzej-stencel

Is this duplicate of https://github.com/open-telemetry/opentelemetry-specification/issues/1273 ?

tigrannajaryan avatar Sep 20 '22 13:09 tigrannajaryan

Thanks ~~Dan~~ Tigran :facepalm:, didn't see this issue. It is closely related. It talks about process namespace and not system, but I think the discussion can be applied to system too. If I understand correctly, (at least from the perspective of this issue) it boils down to adding an attribute process.start_time and system.start_time.

In fact, I can see there's a process namespace for attributes, but I cannot see a system namespace for attributes - only an os namespace. Would the attribute become os.start_time then?

Also when running the OT collector with the hostmetrics receiver, I cannot see any attributes from the os. namespace being reported (this is of course out of scope of this issue and repository).

andrzej-stencel avatar Sep 20 '22 15:09 andrzej-stencel

We discussed this briefly during today's SIG Spec call, let's see where the conversation in open-telemetry/opentelemetry-specification#1273 takes us.

andrzej-stencel avatar Sep 20 '22 15:09 andrzej-stencel

I would support system.uptime as a metric that measures the uptime of the system. The process.uptime is a different concern.

system.uptime would be, in the case of linux, which is read from /proc/uptime. Analogous for other operating systems.

jamesmoessis avatar Sep 21 '22 01:09 jamesmoessis

I support both system.uptime and process.uptime semantic conventions.

jmacd avatar Sep 21 '22 19:09 jmacd

These all make sense, but please pause for now, we are considering refactoring existing semantic conventions. Please come to ongoing discussions. See https://github.com/open-telemetry/opentelemetry-specification/issues/2753.

reyang avatar Sep 23 '22 15:09 reyang

@jsuereth Can we transfer this to the semantic-conventions repository?

mx-psi avatar Jan 18 '24 16:01 mx-psi

Q. Is there any plan to do it? I'm interested in it.

minuk-dev avatar Jun 05 '24 18:06 minuk-dev

@kernelpanic77 offered to do it here https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31627#issuecomment-2133882411 :)

mx-psi avatar Jun 06 '24 08:06 mx-psi

Looks like we have an agreement here. Just need someone to submit a PR

dmitryax avatar Aug 08 '24 14:08 dmitryax

Is there any way we can generalize this to be an "uptime" of any entity, not just of "system"? What if we make this an uptime metric with the Resource describing what it is about (e.g. Resource can have "host.name=foo" to indicate that it is an uptime of a host).

tigrannajaryan avatar Aug 12 '24 14:08 tigrannajaryan

Is there any way we can generalize this to be an "uptime" of any entity, not just of "system"? What if we make this an uptime metric with the Resource describing what it is about (e.g. Resource can have "host.name=foo" to indicate that it is an uptime of a host).

I suppose we could do it, I wonder what others think.

The uptime attribute name would not be namespaced, unlike system.uptime that is namespaced to system. Looking at the Attributes Registry, it doesn't look like we currently have any non-namespaced attributes in the semantic conventions. Is that true?

andrzej-stencel avatar Aug 13 '24 11:08 andrzej-stencel

Is there any way we can generalize this to be an "uptime" of any entity, not just of "system"?

I think this is part of a broader discussion taking place in https://github.com/open-telemetry/semantic-conventions/issues/1161 (system.uptime vs process.uptime vs container.uptime). The main benefit of using the metric without namespace seems to be dashboards correlation and avoiding deduplication. But it comes at the cost of implying resource attributes to corresponding metrics (not sure if this is possible in semconv), for example, the uptime metric should always be linked to either host.name, process.pid or container.id.

During the System Semantic Conventions SIG (20/06/2024) we agreed on keeping the metrics in namespaces (even if there are duplications) due to:

The potential for minute differences between the meanings of seemingly identical metrics between the different contexts The namespaces also semantically represent the reporting source, making query scenarios more clear (i.e. "I want all my operating system process metrics" or "I want all my jvm metrics" has a clear separation due to the metrics reported from each source all having their respective namespaces)

rogercoll avatar Aug 20 '24 09:08 rogercoll