telegraf
telegraf copied to clipboard
inputs.docker mem usage stats are computed wrong for cgroup v1
Relevant telegraf.conf
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
Logs from Telegraf
-
System info
Telegraf 1.23.3 , Ubuntu 20.04.4, Docker 20.10.17
Docker
No response
Steps to reproduce
Just get the mem usage stats from a docker host with cgroups v1.
Expected behavior
Compute mem usage like the docker cli. https://github.com/docker/cli/blob/5c511f4f856b11311557888bbb6a7ada218fef71/cli/command/container/stats_helpers.go#L239
Actual behavior
The mem cache is substracted from the mem usage, which is wrong. https://github.com/influxdata/telegraf/blob/272a0a335500070f764c733c281931f082f91f93/plugins/inputs/docker/stats_helpers.go#L54
Additional info
That mem chache is substracted from the mem usage was the default behaviour of docker cli 19.03 and before. So in the calculateMemUsageUnixNoCache function the version is determined by checking the existence of the cache field. https://github.com/influxdata/telegraf/blob/272a0a335500070f764c733c281931f082f91f93/plugins/inputs/docker/stats_helpers.go#L53 But a cache field is part of cgroups version 1 and does not say anything about the docker version. Also that compution was wrong and therefore fixed. So there is no reason to restore that behavior, even for older hosts.
The change occurred in https://github.com/influxdata/telegraf/pull/10491
@101100 Thoughts on removing vs keeping the check for the cache field? It does look like upstream does not have this same check.
@powersj The upstream doesn't have the check because it knows it is the latest Docker version. 19.03 is pretty old, though, so eliminating the code for that version would be OK and wouldn't affect me.
Another option would be to just prioritize the latest version and fallback on cache. It might be that total_inactive_file and inactive_file aren't in the old version, so you could just try the fields for the latest version and fallback on cache by inverting the order of the if statements:
func CalculateMemUsageUnixNoCache(mem types.MemoryStats) float64 {
// Latest Docker and cgroup v1
if v, isCgroup1 := mem.Stats["total_inactive_file"]; isCgroup1 && v < mem.Usage {
return float64(mem.Usage - v)
}
// Latest Docker and cgroup v2
if v := mem.Stats["inactive_file"]; v < mem.Usage {
return float64(mem.Usage - v)
}
// Docker 19.03 and older fallback
if v, isOldDocker := mem.Stats["cache"]; isOldDocker && v < mem.Usage {
return float64(mem.Usage - v)
}
return float64(mem.Usage)
}
@fmorillo sorry for the delay on this one. Would you be willing to try the artifacts on PR #12616. There should be artifacts attached in 20-30mins that you can use.
Thanks!
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!