semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

Add system metrics reporting total memory capacity or clarify how to recover existing ones

Open mx-psi opened this issue 2 years ago • 6 comments
trafficstars

Current system metrics cover usage of memory, paging/swap memory and filesystems, but we don't currently support the total capacity of any of these systems as a separate metric.

The following metric can't be recovered from the existing system metrics and would need to be added to support this:

Name Description Units Instrument Type Value Type Attribute Key Attribute Values
system.disk.limit Total memory available in the disk. By UpDownCounter Int64 device (identifier)

The following metric may be recovered from system.memory.usage, but the current description of the attribute values of state is insufficient to recover this.

Name Description Units Instrument Type Value Type Attribute Key Attribute Values
system.memory.limit Total memory available in the machine. Does not include paging/swap memory. By UpDownCounter Int64 n/a n/a

The following metrics are available as the sum of used and free (and reserved for the file system one). They could be added as a convenience metric:

Name Description Units Instrument Type Value Type Attribute Key Attribute Values
system.filesystem.limit Total memory available in the disk. By UpDownCounter Int64 device (identifier)
state used, free, reserved
type ext4, tmpfs, etc.
mode rw, ro, etc.
mountpoint (path)
system.paging.limit Total paging/swap memory available. By UpDownCounter Int64 n/a n/a

Items for this issue:

  • [ ] Add system.disk.limit to the specification
  • [x] Clarify set of attribute values for system.memory.usage state and consider using system.memory.total
  • [ ] Consider adding system.filesystem.limit and system.paging.total if convenience justifies it

This would be part of open-telemetry/opentelemetry-specification/issues/3556 if approved.

mx-psi avatar Jun 22 '23 12:06 mx-psi

check out https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#do-not-use-total

.limit is probably the closest existing convention: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#instrument-naming

trask avatar Aug 13 '23 23:08 trask

Thanks, I changed all .total suffixes by .limit suffixes above :)

mx-psi avatar Aug 14 '23 10:08 mx-psi

Curious for next steps, will there be a PR for this change to semantic convention?

yevgentrukhin avatar Sep 07 '23 15:09 yevgentrukhin

Curious for next steps, will there be a PR for this change to semantic convention?

@yevgentrukhin yes, this is part of the system semantic conventions WG roadmap and will be addressed before stabilization of system metrics. If you are interested in having this sooner, I am happy to review PRs related to this and you can join our weekly meeting to discuss if necessary (see here for details)

mx-psi avatar Sep 08 '23 09:09 mx-psi

Clarify set of attribute values for system.memory.usage state and consider using system.memory.total

I have marked this one as done, since #89 we have the the total value for state which would be system.memory.limit

mx-psi avatar Sep 26 '23 12:09 mx-psi

Discussed on January 18th System Semantic Conventions WG meeting, we don't consider this one to be a blocker for system metrics GA unless the new metrics affect existing metrics. @mx-psi to double check this.

mx-psi avatar Jan 18 '24 16:01 mx-psi

@joaopgrassi Sorry, I think in the https://github.com/open-telemetry/semantic-conventions/pull/1356 PR I did not specify that it was partially solving this issue. I reckon system.filesystem.limit and system.paging.limit metrics still need to be added.

rogercoll avatar Sep 02 '24 14:09 rogercoll