opentelemetry-python-contrib
opentelemetry-python-contrib copied to clipboard
`system.cpu.utilization` values are always 0 in first export from `PeriodicExportingMetricReader`
What happened?
Combining PeriodicExportingMetricReader and SystemMetricsInstrumentor produces only zero values for system.cpu.utilization in the first export, which may be the only export if the process doesn't last long.
Steps to Reproduce
from opentelemetry.instrumentation.system_metrics import SystemMetricsInstrumentor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import (
ConsoleMetricExporter,
PeriodicExportingMetricReader,
)
import time
exporter = ConsoleMetricExporter()
reader = PeriodicExportingMetricReader(exporter)
meter_provider = MeterProvider(metric_readers=[reader])
config = {"system.cpu.utilization": ["idle", "user", "system"]}
instrumentor = SystemMetricsInstrumentor(config=config)
instrumentor.instrument(meter_provider=meter_provider)
# Wait a bit to get realistic measurements
time.sleep(1)
Output:
{
"resource_metrics": [
{
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.26.0.dev0",
"service.name": "unknown_service"
},
"schema_url": ""
},
"scope_metrics": [
{
"scope": {
"name": "opentelemetry.instrumentation.system_metrics",
"version": "0.48b0.dev",
"schema_url": "https://opentelemetry.io/schemas/1.11.0"
},
"metrics": [
{
"name": "system.cpu.utilization",
"description": "System CPU utilization",
"unit": "1",
"data": {
"data_points": [
{
"attributes": {
"state": "idle",
"cpu": 1
},
"start_time_unix_nano": 0,
"time_unix_nano": 1723642134243694000,
"value": 0.0
},
{
"attributes": {
"state": "user",
"cpu": 1
},
"start_time_unix_nano": 0,
"time_unix_nano": 1723642134243694000,
"value": 0.0
},
{
"attributes": {
"state": "system",
"cpu": 1
},
"start_time_unix_nano": 0,
"time_unix_nano": 1723642134243694000,
"value": 0.0
},
{
"attributes": {
"state": "idle",
"cpu": 2
},
"start_time_unix_nano": 0,
"time_unix_nano": 1723642134243694000,
"value": 0.0
},
{
"attributes": {
"state": "user",
"cpu": 2
},
"start_time_unix_nano": 0,
"time_unix_nano": 1723642134243694000,
"value": 0.0
},
{
"attributes": {
"state": "system",
"cpu": 2
},
"start_time_unix_nano": 0,
"time_unix_nano": 1723642134243694000,
"value": 0.0
},
"... etc for every core ..."
]
}
}
],
"schema_url": "https://opentelemetry.io/schemas/1.11.0"
}
],
"schema_url": ""
}
]
}
Expected Result
Some values greater than 0.
Actual Result
"value": 0.0 for every data point.
Additional context
In https://psutil.readthedocs.io/en/latest/#psutil.cpu_times_percent it says:
Warning: the first time this function is called with interval = 0.0 or None it will return a meaningless 0.0 value which you are supposed to ignore.
which is where the problem comes from. However, the first time a user calls cpu_times_percent in the main thread, it actually returns a nonzero value. This is because psutil has this at the top level of the module:
_last_cpu_times = {threading.current_thread().ident: cpu_times()}
This means that when metrics are read on the main thread (e.g. by adding meter_provider.force_flush() to the script above, or by using InMemoryMetricReader) then the CPU values are nonzero.
The problem is that PeriodicExportingMetricReader naturally has to perform readings on a separate thread. From the same psutil docs:
Internally this function maintains a global map (a dict) where each key is the ID of the calling thread (threading.get_ident). This means it can be called from different threads, at different intervals, and still return meaningful and independent results.
Adding the following code to the script above anywhere between reader = ... and time.sleep seems to fix the problem:
import psutil
cpu_times = psutil.cpu_times_percent(percpu=True)
psutil._last_per_cpu_times_2[reader._daemon_thread.ident] = cpu_times
Actually implementing this fix in a sensible and maintainable way seems more complicated.
Would you like to implement a fix?
No