agones
agones copied to clipboard
GameServer stuck on state Scheduled when Pod failed with reason OutOfpods
What happened:
Agones didn't create a new Pod when a Pod failed due to reasons OutOfpods, and the GameServer stuck with state Scheduled.
What you expected to happen:
GameServer is expected to create a new Pod if a Pod fails due to reasons of OutOfpods.
How to reproduce it (as minimally and precisely as possible):
- Put the following manifest in
/etc/kubernetes/manifests/static-pod.manifest
of the testing node.
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: kube-system
labels:
component: nginx
tier: node
spec:
hostNetwork: true
containers:
- name: nginx
image: nginx:1.14.2
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
priorityClassName: system-node-critical
priority: 2000001000
tolerations:
- effect: NoExecute
operator: Exists
- effect: NoSchedule
operator: Exists
- Set Fleet replicas to pod capacity of the node
- Confirm some of the gameserver pods stuck with state Pending.
- Forcibly delete static-pod created from step (1) in kube-system.
-
kubectl delete pod --force --grace-period=0 <static-pod-name> -n kube-system
-
All gameserver pods stuck with state Pending become failed with reason OutOfpods
.
Anything else we need to know?:
Here is the Pod status that I reproduce.
status:
message: 'Pod Node didn''t have enough resource: pods, requested: 1, used: 32, capacity:
32'
phase: Failed
reason: OutOfpods
I created the Fleet from official document.
Environment:
- Agones version: 1.20.0
- Kubernetes version (use
kubectl version
): Client Version: v1.21.0 Server Version: v1.21.12-gke.1500 - Cloud provider or hardware configuration: GKE
- Install method (yaml/helm): yaml
- Troubleshooting guide log(s):
- Others:
GameServer is expected to create a new Pod if a Pod fails due to reasons of OutOfpods.
Sorry, but maybe I'm missing something - but how is Agones supposed to create a new Pod if there isn't room in the cluster?
Or do you mean that Agones doesn't recover when there should be room?
I'm also concerned that if you delete all the pods in the kube-system
namespace, you are also breaking Kubernetes.
GameServer is expected to create a new Pod if a Pod fails due to reasons of OutOfpods.
Sorry, but maybe I'm missing something - but how is Agones supposed to create a new Pod if there isn't room in the cluster?
Or do you mean that Agones doesn't recover when there should be room?
I mean that Agones doesn't recover when there should be room.
I'm also concerned that if you delete all the pods in the
kube-system
namespace, you are also breaking Kubernetes.
Usually, if a Node has exceeded the Pod's capacity and there are no other Nodes that can be scheduled, the Pod's status is stuck at Pending. Since failure by OutOfpods is rare, it was likely necessary to put the cluster in a broken state in order to reproduce it. As far as I could tell, simply deleting the GameServer did not reproduce it.
I have updated the reproduction procedure to be more accurate. By forcibly deleting pods, it can be reproduced in one attempt.
I'm curious - what does kubectl delete pod --force --grace-period=0 --all -n kube-system
force to happen?
Is that a required step to replicate the issue?
Also, can you share a kubectl describe
of the Pod that has failed please as well?
It sounds like we should move the GameServer to Unhealthy if the backing pod moves to an OutOfpods
state, but I'm just trying to nail down exactly what is happening here.
I'm curious - what does
kubectl delete pod --force --grace-period=0 --all -n kube-system
force to happen?Is that a required step to replicate the issue?
I tried forcibly deleting only the GameServer pods, but still could not reproduce the problem. So I checked which kube-system component is actually causing this OutOfpods. As a result, kube-proxy is causing this problem.
I have updated the confirmed reproduction method.
Also, can you share a
kubectl describe
of the Pod that has failed please as well?
Sure, here it is.
Name: simple-game-server-qxtcq-wbs6p
Namespace: default
Priority: 0
Node: gke-friday-developme-gameserver-pool--f788a1c2-szxx/
Start Time: Thu, 28 Jul 2022 03:15:22 +0000
Labels: agones.dev/gameserver=simple-game-server-qxtcq-wbs6p
agones.dev/role=gameserver
Annotations: agones.dev/container: simple-game-server
agones.dev/sdk-version: 1.20.0
cluster-autoscaler.kubernetes.io/safe-to-evict: false
Status: Failed
Reason: OutOfpods
Message: Pod Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32
IP:
IPs: <none>
Controlled By: GameServer/simple-game-server-qxtcq-wbs6p
Containers:
agones-gameserver-sidecar:
Image: gcr.io/agones-images/agones-sdk:1.20.0
Port: <none>
Host Port: <none>
Args:
--grpc-port=9357
--http-port=9358
Requests:
cpu: 30m
Liveness: http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=3
Environment:
GAMESERVER_NAME: simple-game-server-qxtcq-wbs6p
POD_NAMESPACE: default (v1:metadata.namespace)
FEATURE_GATES: CustomFasSyncInterval=false&Example=true&NodeExternalDNS=true&PlayerAllocationFilter=false&PlayerTracking=false&SDKGracefulTermination=false&StateAllocationFilter=false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s4rzz (ro)
simple-game-server:
Image: gcr.io/agones-images/simple-game-server:0.13
Port: 7654/UDP
Host Port: 7258/UDP
Requests:
cpu: 0
memory: 0
Liveness: http-get http://:8080/gshealthz delay=5s timeout=1s period=5s #success=1 #failure=3
Environment:
AGONES_SDK_GRPC_PORT: 9357
AGONES_SDK_HTTP_PORT: 9358
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from empty (ro)
Volumes:
empty:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-s4rzz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m44s default-scheduler 0/1 nodes are available: 1 Too many pods.
Warning FailedScheduling 2m42s default-scheduler 0/1 nodes are available: 1 Too many pods.
Normal Scheduled 43s default-scheduler Successfully assigned default/simple-game-server-qxtcq-wbs6p to gke-friday-developme-gameserver-pool--f788a1c2-szxx
Normal NotTriggerScaleUp 2m44s cluster-autoscaler pod didn't trigger scale-up:
Warning OutOfpods 44s kubelet Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32
Can you reproduce the issue without actively deleting Kubernetes components?
Today I tried to reproduce without forcibly deleting kube-proxy, but I couldn't.
First, I deploy static pod which has similar spec with kube-proxy pod, then delete forcibly, and didn't work.
Here is the set of specs I try to align with kube-proxy.
- ownerReferences
- priorityClassName
- tolerations
- nodeName
- resource.requests.cpu
The actual Pod resource is this:
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: kube-system
labels:
component: nginx
tier: node
ownerReferences:
- apiVersion: v1
controller: true
kind: Node
name: gke-friday-developme-gameserver-pool--f788a1c2-pdj1
uid: 37d22a61-6e19-4729-bf3e-86a8823c9215
spec:
containers:
- name: nginx
image: nginx:1.14.2
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
nodeName: gke-friday-developme-gameserver-pool--f788a1c2-pdj1
priorityClassName: system-node-critical
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
operator: Exists
- effect: NoSchedule
operator: Exists
Next, I tried setting pods to 0 in ResourceQuota and forcibly deleting GameServer, which also did not work.
# deploy to the same namespace as GameServers exist
apiVersion: v1
kind: ResourceQuota
metadata:
name: pod-counts
namespace: default
spec:
hard:
pods: "0"
Here is the log for kubelet.
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.860377 14147 kubelet.go:1950] "SyncLoop DELETE" source="api" pods=[kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1]
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.860644 14147 kubelet_pods.go:1520] "Generating pod status" pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.861193 14147 kubelet.go:1668] "Trying to delete pod" pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1" podUID=9c095625-5113-476e-a638-bc8a78a9271b
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.861335 14147 mirror_client.go:125] "Deleting a mirror pod" pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1" podUID=0xc000b4b060
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.872105 14147 config.go:278] "Setting pods for source" source="api"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.873388 14147 kubelet.go:1944] "SyncLoop REMOVE" source="api" pods=[kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1]
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.883964 14147 config.go:278] "Setting pods for source" source="api"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.884289 14147 config.go:383] "Receiving a new pod" pod="default/simple-game-server-psqqb-hnz28"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.887060 14147 volume_manager.go:394] "Waiting for volumes to attach and mount for pod" pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.887324 14147 volume_manager.go:425] "All volumes are attached and mounted for pod" pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.887937 14147 kuberuntime_manager.go:711] "computePodActions got for pod" podActions={KillPod:false CreateSandbox:false SandboxID:2d3ba683c2accef9b8e33e0d44ff5602267cb2b4645f5cce78d4f04ff6c20c2a Attempt:0 NextInitContainerToStart:nil ContainersToStart:[] ContainersToKill:map[] EphemeralContainersToStart:[]} pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.888478 14147 kubelet.go:1934] "SyncLoop ADD" source="api" pods=[default/simple-game-server-psqqb-hnz28]
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.888680 14147 topology_manager.go:187] "Topology Admit Handler"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.889513 14147 predicate.go:143] "Predicate failed on Pod" pod="simple-game-server-psqqb-hnz28_default(34520514-3a31-4fc2-a039-5727211e7f4b)" err="Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.890657 14147 event.go:291] "Event occurred" object="default/simple-game-server-psqqb-hnz28" kind="Pod" apiVersion="v1" type="Warning" reason="OutOfpods" message="Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.922069 14147 status_manager.go:586] "Patch status for pod" pod="default/simple-game-server-psqqb-hnz28" patchBytes="{\"metadata\":{\"uid\":\"34520514-3a31-4fc2-a039-5727211e7f4b\"},\"status\":{\"conditions\":null,\"message\":\"Pod Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32\",\"phase\":\"Failed\",\"qosClass\":null,\"reason\":\"OutOfpods\",\"startTime\":\"2022-07-29T01:42:42Z\"}}"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.922330 14147 status_manager.go:595] "Status for pod updated successfully" pod="default/simple-game-server-psqqb-hnz28" statusVersion=1 status={Phase:Failed Conditions:[] Message:Pod Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32 Reason:OutOfpods NominatedNodeName: HostIP: PodIP: PodIPs:[] StartTime:2022-07-29 01:42:42 +0000 UTC InitContainerStatuses:[] ContainerStatuses:[] QOSClass: EphemeralContainerStatuses:[]}
Any Ideas to reproduce?
I have updated the reproduction step. I said I deployed a static pod in kube-system, but I actually deploy a bare pod, not a static pod. I created a static pod like this step, then the issue was reproduced without deleting kube-proxy.
Maybe a silly question but - why would someone add that manifest to a node? (most people I would expect are on cloud providers and either (a) don't have access or (b) will have it overwritten pretty fast)
According to the documentation, Static Pods are supposed to be used by users to deploy their own control plane components, but I'm not sure if there are other use cases where users actually use them. Just to be clear, I used Static Pods only to reproduce this issue, so I did not use it in a real environment.
We're currently workaround this issue by running descheduler to evict Pods failed with Outofpods. However, it delays for a certain time period, since descheduler running with cronjob.
So, I would like to submit a PR that solves this issue, but I don't feel I can put a reproduction method into a test case... 😢
Today I encountered the same behavior as this issue when the Pod failed with OutOfcpu
.
It seems that when the Pod failed in some reasons, the GameServer does not recover automatically 🤔
@markmandel
I've been working on this issue and it seems that this issue is caused by the insufficient resource error in kubelet. https://github.com/kubernetes/kubernetes/blob/v1.21.12/pkg/kubelet/lifecycle/predicate.go#L140
My suggestion to fix this issue is that adding condition whether or not Pod is failed by insufficient resource error here. https://github.com/googleforgames/agones/blob/main/pkg/gameservers/health.go#L105
If it's ok to apply this fix, I'll submit a PR for this.
What do you think?
I'll be honest, I'm still not understanding what the actual issue here is. It seems like you have to break Kubernetes to make it happen -- which doesn't sound like an actual bug, it sounds like an extreme edge case.
Also, a Pod in pending state is an indicator to the cluster autoscaler that it should expand the cluster - so changing that behaviour says to me that we should leave this alone.
If you can't replicate this issue without actually messing around with the underlying Kubernetes system, I'm not sure we should be considering a fix here?
I too am aware that I have hit an edge case rather than found a bug. We are currently downsize the entire cluster after work hours, and we get it back to where it was before work hours. I feel this makes it easier to step on edge cases.
Also, a Pod in pending state is an indicator to the cluster autoscaler that it should expand the cluster - so changing that behaviour says to me that we should leave this alone.
There may be some misunderstanding in this part. The problem is not the Pod in pending state, but in failed state. The GameServer does not recreate the backing Pod in failed state, GameServer never transition its state from Scheduled to another, even if cluster autoscaler scale out nodes.
This may prevents FleetAutoScaler from working properly, since GameServer stuck in Scheduled state and FleetAutoScaler cannot scale GameServer until manually delete the Pod in failed state.
If you can't replicate this issue without actually messing around with the underlying Kubernetes system, I'm not sure we should be considering a fix here?
Maybe I should ask the sig-node community for help to replicate the issue without killing static pods.
I don't know if it's worth mentioning, but a Pod created from Deployment will not get stuck in this state and become Running.
So my thought is that it would be nice if Pods created by GameServer could be recovered in the same way.
Gotcha!
So ultimately it sounds like if a Pod status is Failed
- we should handle that general case (less of an issue with OutOfPods, but more generally if a Pod is in a Failed state).
I wonder if there is an easy way to just create a Failed Pod somehow, and use that as our test case. It does sound like if a Pod has failed for any reason, it should be moved to Unhealthy anyway.
I had a quick look to see if there was an easy way to make that happen though. Did you have any luck with sig-node?
Thank you for straightening out the issue.
I wonder if there is an easy way to just create a Failed Pod somehow, and use that as our test case. It does sound like if a Pod has failed for any reason, it should be moved to Unhealthy anyway. I had a quick look to see if there was an easy way to make that happen though. Did you have any luck with sig-node?
Sorry, I've been too busy to ask the community for help yet, but I will do so soon.
@markmandel
Sorry, It takes too long to ask question. I've posted the question to k8s community slack channel, but couldn't get an answer. I've also tried to create a failed pod, but I have no idea to change pod's status.phase to Failed.
Have you found any way to create a Failed Pod?
Looking at: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
Failed: All containers in the Pod have terminated, and at least one container has terminated in failure. That is, the container either exited with non-zero status or was terminated by the system.
If this happens (assuming there is a Fleet in use), this will kick the GameServer into Unhealthy , and it will then get replaced with a new Pod.
...also, if we can't replicate the issue, is it an issue? 😄
@markmandel
I found a manifest to replicate the issue.
We have to set restartPolicy
to Never
, then exit containers with non-zero status.
To exit all containers with non-zero status, I have to add hostPID: true
to see all container process id.
I also found that exit containers with zero status will stuck on phase Succeeded
. Should we handle this case, too? 🤔
apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
name: simple-game-server
namespace: default
spec:
replicas: 1
template:
spec:
ports:
- name: default
containerPort: 7654
template:
spec:
hostPID: true
restartPolicy: Never
containers:
- name: simple-game-server
image: gcr.io/agones-images/simple-game-server:0.13
command:
- sh
- -cx
- |
pgrep -n sdk-server | xargs kill -9
exit 1
Related issue: https://github.com/googleforgames/agones/issues/2361
I would like to chime in here as it seems like it's the same issue.
There is a relatively fresh kubernetes feature - https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown It seems like it can lead to pods being transitioned into Failed state too:
Status: Failed
Reason: Terminated
Message: Pod was terminated in response to imminent node shutdown.
I think controlling GameServer
should indeed be moved to unhealthy here as a correct reaction.
I don't have a concrete way to reproduce unfortunately, is we encountered an issue in production on a loaded cluster.
But I can guess that this will happen if nodes shutdownGracePeriod
and shutdownGracePeriodCriticalPods
are not 0 (to enable the feature) but not enough to actually terminate containers inside the pod gracefully, due to them having bigger terminationGracePeriodSeconds
and actually using it up.
This is all kinds of fun, because of Pod restarts 😄 so I appreciate people digging in.
One thing I'm trying to work out from the documentation is, that if the Pod is in a Failed
state, is that the final state?
I.e. do we know if a Pod could restart its way out of a Failed
state? @roberthbailey @zmerlynn @gongmax @igooch any of you know? The docs don't seem clear to me.
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
status.phase
is a synthetic view really meant for humans. As pointed out in the link: The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. The phase is not intended to be a comprehensive rollup of observations of container or Pod state, nor is it intended to be a comprehensive state machine. Really, you can think of status.phase
as aggregating the state of each container and potentially other Pod conditions.
That said, with restartPolicy: Never
, I would expect Failed
to be terminal (except for possibly some nuance around the state of the SDK sidecar container). It would be useful to have the full kubectl get -oyaml
for the Pod in question rather than describe view, just to see.
Since we run (by default) with restartPolicy: Always
on the Pod, we have to assume there is a restart.
@unlightable in your situation, i assume once the node was torn down, the GameServer
was replaced on another node?