semantic-conventions
semantic-conventions copied to clipboard
Add Pressure Stall Information (PSI) metrics (reopened #2996)
Closes #2995
Changes
This PR adds support for Linux Pressure Stall Information (PSI) metrics to the system semantic conventions.
PSI is a Linux kernel feature (available since kernel 4.20) that identifies and quantifies resource contention by measuring the time impact that CPU, memory, and I/O resource crunches have on workloads.
New Metrics
system.linux.psi.pressure(Gauge): Measures resource pressure as a percentage of time that tasks were stalled over a time window (10s, 60s, or 300s)system.linux.psi.total_time(Counter): Tracks the total cumulative stall time in microseconds since system boot
New Attributes
system.psi.resource: The resource type (cpu,memory,io)system.psi.stall_type: The stall severity (somefor partial stalls,fullfor complete stalls where all non-idle tasks are blocked)system.psi.window: The time window for pressure calculation (10s,60s,300s)
Use Cases
PSI metrics enable:
- Sizing workloads to hardware or provisioning hardware according to workload demand
- Detecting productivity losses caused by resource scarcity
- Dynamic system management (load shedding, job migration, strategic pausing)
- Maximizing hardware utilization without sacrificing workload health
References
Relevant issues and PRs
There are issues on this matter in:
- https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/42779
- https://github.com/open-telemetry/opentelemetry-go-contrib/issues/8082
And 2 PRs that I am proposing to address these issues:
- https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/43823
- https://github.com/open-telemetry/opentelemetry-go-contrib/pull/8083
[!IMPORTANT] Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management. PRs that do not follow the guidance above, may be automatically rejected and closed.
Merge requirement checklist
- [x] CONTRIBUTING.md guidelines followed.
- [x] Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with
[chore]
- If your PR does not need a change log, start the PR title with
- [x] Links to the prototypes or existing instrumentations (when adding or changing conventions)
- Prometheus node exporter has PSI metrics enabled by default
Reopened #2996