nomad icon indicating copy to clipboard operation
nomad copied to clipboard

Nomad does not registred OOM kills of own docker containers

Open Alexsandr-Random opened this issue 1 year ago • 0 comments
trafficstars

Nomad version

Nomad v1.8.3 BuildDate 2024-08-13T07:37:30Z Revision 63b636e5cbaca312cf6ea63e040f445f05f00478

Operating system and Environment details

Ubuntu 22.04.4 LTS

Issue

When container is OOM it would be killed via kernel and than invokes nomad and register this kill. Than we could see it via our monitoring system using Prometheus and metrics nomad_client_allocs_oom_killed

But for now we see some OOMs that's not registered by nomad and we could see it only in syslog. Why that's happening that way? It seems like bug.

Here you could see some of logs what we discovered (we are 100% sure that container served by nomad):

Oct 30 10:13:57 main kernel: [ 7096.161113] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=fd92d65e92f9edc800266ece977e4316a31716bf6f008ba834b4a930b19b20e3,mems_allowed=0,oom_memcg=/docker/fd92d65e
92f9edc800266ece977e4316a31716bf6f008ba834b4a930b19b20e3,task_memcg=/docker/fd92d65e92f9edc800266ece977e4316a31716bf6f008ba834b4a930b19b20e3,task=php-fpm,pid=79699,uid=1000
Oct 30 10:13:57 main kernel: [ 7096.161121] Memory cgroup out of memory: Killed process 79699 (php-fpm) total-vm:568056kB, anon-rss:107940kB, file-rss:11516kB, shmem-rss:6280kB, UID:1000 pgtables:524kB oom_scor
e_adj:0

Here is actual behavior when kill registered by nomad (but previous kills are not registered...): image

Here is behavior when kill not registered by nomad at all: image

Expected Result

OOM registers all kills of containers what it orchestrates

Actual Result

OOM registers only some of kills of containers what it orchestrates

I have only 1 guess. We have 2 different OOM kill situations, the first one is when the resources of the entire nomad client are exhausted (kill via kernel), the second one is when the resources that are set for the container in memory and memory_max are exhausted (kill via nomad) Under such conditions, they are independent of each other and nomad cannot register this kill from the kernel, although it would be very useful.

Alexsandr-Random avatar Oct 30 '24 12:10 Alexsandr-Random