containerpilot
containerpilot copied to clipboard
Telemetry custom metrics always zero
Version: 3.8.0 GitHash: 408dbc9
The config is containerpilot.json:
{
"consul": "localhost:8500",
"logging": {
"level": "DEBUG"
},
"jobs": [
{
"name": "consul-agent",
"exec": ["/usr/local/bin/consul", "agent", "-data-dir=/data", "-config-dir=/config", "-rejoin", "-retry-join", "{{ .CONSUL }}", "-retry-max", "10", "-retry-interval", "10s"],
"restarts": "unlimited"
},
{
"name": "sensor",
"exec": ["/usr/local/bin/update-sensors.sh"],
"when": {
"interval": "5s"
}
}
],
"telemetry": {
"port": 9090,
"metrics": [
{
"name": "wp_memory_percent",
"help": "percentage of memory used",
"type": "gauge"
},
{
"name": "wp_cpu_load",
"help": "cpu load",
"type": "gauge"
}
]
}
}
An extract from the log is containerpilot.log:
2018-07-05T03:59:10.494881366Z timer: {TimerExpired sensor.run-every}
2018-07-05T03:59:10.494908219Z sensor.Run start
2018-07-05T03:59:10.507228697Z sensor 700961 memory check fired
2018-07-05T03:59:10.538809037Z event: {Metric wp_memory_percent|6.91}
2018-07-05T03:59:10.544818683Z sensor 700961 cpu check fired
2018-07-05T03:59:10.73909559Z event: {Metric wp_cpu_load|0.26}
2018-07-05T03:59:10.743664642Z sensor exited without error
2018-07-05T03:59:10.743687226Z event: {ExitSuccess sensor}
2018-07-05T03:59:10.743706402Z sensor.Run end
2018-07-05T03:59:10.743732573Z sensor.term
2018-07-05T03:59:10.743742082Z terminating command 'sensor' at pid: 700961
These lines repeat every 5 seconds with only the sensor
pid and the metric event values changing.
From the log it does appear that the metric values are being received by containerpilot, but when I view the http://localhost:9090/metrics endpoint it always shows 0 for the custom metrics wp_cpu_load
and wp_memory_percent
:
...
# HELP wp_cpu_load cpu load
# TYPE wp_cpu_load gauge
wp_cpu_load 0
# HELP wp_memory_percent percentage of memory used
# TYPE wp_memory_percent gauge
wp_memory_percent 0
Correct, the log will show the PutMetric
endpoint being hit but it doesn't necessarily mean the value was persisted.
Try something like this, where you break up the name
into namespace
and subsystem
.
"metrics": [
{
"namespace": "wp",
"subsystem": "memory",
"name": "percent",
"help": "percentage of memory used",
"type": "gauge"
},
]
}
That should allow you to post a metric using the following command.
$ /bin/containerpilot -putmetric 'wp_memory_percent=42'
This is how our integration test checks a counter
. Though now I'm noticing that we're lacking tests for other prometheus metric types such as gauge
.
Separating the name into namespace, subsystem, and name worked.
If this is now required, then the docs need updating.
From Telemetry >> Collector configuration:
You can leave off the namespace and subsystem values and put everything into the name field if desired.
That's good to know. When someone has free time it would be nice to enforce that namespace
and subsystem
can be optional (and also add integration tests).
I'll leave this issue open for that.