TaskOOM event lost
Description
https://github.com/kubernetes/kubernetes/issues/119600
When I was troubleshooting the problem that the Reason was not OOMKilled, I found that the container lost the TaskOOM event.
Steps to reproduce the issue
e2e: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-containerd-node-e2e-1-7/1683184526936772608 logfile: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-containerd-node-e2e-1-7/1683184526936772608/artifacts/tmp-node-e2e-92a26bb7-cos-97-16919-353-1-system.log containerd_id: 98bac6c4b73cfd4c647647aa80248208b67de059874d4ddfdd1aa43ea3967e54 containerd_name: oomkill-multi-target-container
- kernel having trigger oom
Jul 23 18:56:46 tmp-node-e2e-92a26bb7-cos-97-16919-353-1 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-98bac6c4b73cfd4c647647aa80248208b67de059874d4ddfdd1aa43ea3967e54.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod17efb6ec_09ab_4eec_8ad6_f953eb1a7466.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod17efb6ec_09ab_4eec_8ad6_f953eb1a7466.slice/cri-containerd-98bac6c4b73cfd4c647647aa80248208b67de059874d4ddfdd1aa43ea3967e54.scope,task=dd,pid=38287,uid=0
Jul 23 18:56:46 tmp-node-e2e-92a26bb7-cos-97-16919-353-1 kernel: Memory cgroup out of memory: Killed process 38287 (dd) total-vm:21724kB, anon-rss:14744kB, file-rss:508kB, shmem-rss:0kB, UID:0 pgtables:68kB oom_score_adj:997
Jul 23 18:56:46 tmp-node-e2e-92a26bb7-cos-97-16919-353-1 kernel: Tasks in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod17efb6ec_09ab_4eec_8ad6_f953eb1a7466.slice/cri-containerd-98bac6c4b73cfd4c647647aa80248208b67de059874d4ddfdd1aa43ea3967e54.scope are going to be killed due to memory.oom.group set
- and then containerd exit shim
Jul 23 18:56:46 tmp-node-e2e-92a26bb7-cos-97-16919-353-1 containerd[910]: time="2023-07-23T18:56:46.329304474Z" level=debug msg="received exit event container_id:\"98bac6c4b73cfd4c647647aa80248208b67de059874d4ddfdd1aa43ea3967e54\" id:\"98bac6c4b73cfd4c647647aa80248208b67de059874d4ddfdd1aa43ea3967e54\" pid:38265 exit_status:137 exited_at:{seconds:1690138606 nanos:325950236}"
Jul 23 18:56:46 tmp-node-e2e-92a26bb7-cos-97-16919-353-1 containerd[910]: time="2023-07-23T18:56:46.389869167Z" level=info msg="shim disconnected" id=98bac6c4b73cfd4c647647aa80248208b67de059874d4ddfdd1aa43ea3967e54 namespace=k8s.io
Jul 23 18:56:46 tmp-node-e2e-92a26bb7-cos-97-16919-353-1 containerd[910]: time="2023-07-23T18:56:46.390142692Z" level=warning msg="cleaning up after shim disconnected" id=98bac6c4b73cfd4c647647aa80248208b67de059874d4ddfdd1aa43ea3967e54 namespace=k8s.io
- After that, no TaskOOM event was received until container delete.
Describe the results you received and expected
Expect to receive TaskOOM event.
What version of containerd are you using?
1.7
Any other relevant information
ubuntu
Show configuration if it is related to CRI plugin.
no
@Iceber is the issue you mentioned before? Let's say that init command is sh -c "oom-process". The "oom-process" causes the oom event but the sh -c doesn't. So, the containerd only records the sh -c exit code instead of sub-process's.
Discuss with @Iceber and think that the shim process exits too quickly, causing the shim not to receive the OOM event.
My idea is that the current watch oom s.ep.Add is executed after the process starts container.Start, whether we can put it before the start.
https://github.com/containerd/containerd/blob/40f26543bdc27cbe8b058ac082e91c5832bb1c41/runtime/v2/runc/task/service.go#L263-L273
Discuss with @Iceber and think that the shim process exits too quickly, causing the shim not to receive the OOM event.
It is also possible that the cgroup was added to the oom watcher after the container was started, and this was when the process was already oom and the cgroup watcher lost the oom event
/assign
Discuss with @Iceber and think that the shim process exits too quickly, causing the shim not to receive the OOM event.
It is also possible that the cgroup was added to the oom watcher after the container was started, and this was when the process was already oom and the cgroup watcher lost the oom event
Can it be the same as this than: https://github.com/containerd/containerd/issues/8180?
Discuss with @Iceber and think that the shim process exits too quickly, causing the shim not to receive the OOM event.
It is also possible that the cgroup was added to the oom watcher after the container was started, and this was when the process was already oom and the cgroup watcher lost the oom event
Can it be the same as this than: #8180?
looks like a problem.
type TaskOOM struct {
state protoimpl.MessageState
sizeCache protoimpl.SizeCache
unknownFields protoimpl.UnknownFields
ContainerID string `protobuf:"bytes,1,opt,name=container_id,json=containerId,proto3" json:"container_id,omitempty"`
}
find ContainerID value is not ContainerID, is SandboxID.
- faild test log: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-containerd-node-e2e-1-7/1697683704735862784/artifacts/tmp-node-e2e-b9a730eb-ubuntu-gke-2204-1-24-v20220623-system.log
- recive TaskOOM event, butt container_id value
4bcba7fc4369fee4d7d2e6263b58fce52cfc928c6a31cfac4d43c838708991ceis error, but it is SandboxID.
$ Sep 01 19:12:33 tmp-node-e2e-b9a730eb-ubuntu-gke-2204-1-24-v20220623 containerd[1472]: time="2023-09-01T19:12:33.012922721Z" level=info msg="TaskOOM event container_id:\"4bcba7fc4369fee4d7d2e6263b58fce52cfc928c6a31cfac4d43c838708991ce\""
$ Sep 01 19:12:34 tmp-node-e2e-b9a730eb-ubuntu-gke-2204-1-24-v20220623 containerd[1472]: time="2023-09-01T19:12:34.733787645Z" level=info msg="StopPodSandbox for \"4bcba7fc4369fee4d7d2e6263b58fce52cfc928c6a31cfac4d43c838708991ce\""
- success test log: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-containerd-node-e2e-1-7/1696958640436547584/artifacts/tmp-node-e2e-d1698486-cos-97-16919-353-23-system.log
- recive TaskOOM event, containerd_id is right.
$ Aug 30 19:08:12 tmp-node-e2e-d1698486-cos-97-16919-353-23 containerd[922]: time="2023-08-30T19:08:12.422392986Z" level=info msg="CreateContainer within sandbox \"5df73d21cde36f876f4692a23354ec6e8858851f8fcd2ad20844e7efb7d4d98c\" for &ContainerMetadata{Name:oomkill-single-target-container,Attempt:0,} returns container id \"7c19542bf60d2091b242cc5997afc8224601e0f3f4758385c3e257b27c214ac3\""
$ Aug 30 19:08:18 tmp-node-e2e-d1698486-cos-97-16919-353-23 containerd[922]: time="2023-08-30T19:08:18.386243530Z" level=info msg="TaskOOM event container_id:\"7c19542bf60d2091b242cc5997afc8224601e0f3f4758385c3e257b27c214ac3\""
xlink with some CRI-O worka around this: https://github.com/kubernetes/kubernetes/pull/120460#issuecomment-1708794531
We have also hit this, I can reproduce relatively easily via:
ctr run --memory-limit $((500 * 1024 * 1024)) --mount type=tmpfs,options=size=1G,dst=/tester docker.io/library/ubuntu:latest foo /bin/sh -c 'cat /dev/urandom > /tester/foo'
Hi, @lengrongfu. I'm Dosu, and I'm helping the containerd team manage their backlog. I'm marking this issue as stale.
Issue Summary:
- The issue involves a missing TaskOOM event in a Kubernetes container, where the termination reason was not marked as OOMKilled.
- Discussions suggest the shim process may exit too quickly, or the cgroup was added to the OOM watcher too late.
- SergeyKanzhelev linked a similar issue (#8180) and mentioned a workaround in CRI-O.
- Tych0 provided a method to reproduce the issue.
- You identified a discrepancy where the TaskOOM event's
ContainerIDwas incorrectly set toSandboxID.
Next Steps:
- Please confirm if this issue is still relevant to the latest version of the containerd repository by commenting on this issue.
- If there is no further activity, the issue will be automatically closed in 7 days.
Thank you for your understanding and contribution!
This is still happening for containerd 2.1.4 and it doesn't seem to be fixed based on the last 2.1.5 release notes.
can you reproduce it @patrickshan
https://github.com/containerd/containerd/pull/12614 this pr maybe can reduce the probability of it happening. @patrickshan
We’ve observed this behavior occasionally in our cluster. After further investigation, we found there was an 8-minute delay between when the TaskOOM event occurred and when containerd actually received it, due to the node being under some load at the time. This appears to be a different issue from the one your PR addresses.
func (s *service) oomEvent(id string) {
send := make(chan struct{})
go func() {
s.shutdown.RegisterCallback(func(context.Context) error {
// Wait oomEvent publish done.
<-send
return nil
})
}()
// add some log
err := s.publisher.Publish(s.context, runtime.TaskOOMEventTopic, &eventstypes.TaskOOM{
ContainerID: id,
})
// add some log
if err != nil {
log.G(s.context).WithError(err).Error("post event")
}
close(send)
}
you can add some log before and after s.publisher.Publish to address who use much time @patrickshan