agones
agones copied to clipboard
simple game servers eats too much cpu
What happened:
Simple game server eats too much CPU while doing nothing
What you expected to happen:
Unrestricted deployment does not use CPU until I call it intensively via netcat.
How to reproduce it (as minimally and precisely as possible):
Deploy simple gameserver on Windows without restricted fleet. It eats whole core, while industrial grade full blown real GS eats only 1/4
Anything else we need to know?:
Debug is simple game server 0.3 from gcr
debug-dev-mw48w-48jxr 891m 117Mi
debug-dev-mw48w-lm6tq 884m 118Mi
Simple GS running on Windows
Environment:
- Agones version: 1.15
- Kubernetes version (use
kubectl version
): 1.18 - Cloud provider or hardware configuration: Azure
- Install method (yaml/helm): yaml
- Troubleshooting guide log(s):
- Others:
Hi @dzmitry-lahoda - thanks for filing this bug report. Can you provide a bit more information / clarification for me?
- What do you mean by "unrestricted deployment" and "restricted fleet"?
- Are you running with udp only (the default), tcp only, or both?
- Are you deploying the game server in a fleet with the example yaml file or are you using a different config?
- What version of Windows are you running on the node?
Thanks!
I did not set CPU limit to gameserver template. Will set it as fix - restrict.
apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
name: debug
spec:
strategy:
type: Recreate
template:
spec:
ports:
- name: default
portPolicy: Dynamic
containerPort: 7654
health:
initialDelaySeconds: 30
periodSeconds: 60
template:
spec:
containers:
- name: debug
image: debug
nodeSelector:
kubernetes.io/os: windows
images:
- name: debug
newName: gcr.io/agones-images/simple-game-server # {"$imagepolicy": "flux-system:debug-image-policy:name"}
newTag: "0.3" # {"$imagepolicy": "flux-system:debug-image-policy:tag"}
I run above kustomized template. Issue reproduced after restart - no calls were made to any port. I assume default is UDP.
I do not limit, as in example
limits:
memory: "64Mi"
cpu: "20m
So I assume that is fix, bit still sample could yield CPU somehow. For now it probably loops cycles (nice test so - to eat full core).
Operating system
:
Windows
Size
:
Standard_D2_v2 (2 instances)
AKS Base Windows Image (17763.1697.210129)
So seems fix is to limit, so may be can make the server not to be CPU hungry as option.
Follow up question, what does "eats too much CPU while doing nothing" specifically mean?
Do you mean how much is allocated to the process by the scheduler? Or do you mean you are looking at CPU graphs on the node and seeing spikes in usage?
I have real heavy gameserver with no players and simple game server. Both never where called by UDP or TCP. Real gameserver consumes 200m CPU. While simple game server consumes 990m (one core) of CPU. As it is shown by kubectl top pod
. I have autoscaler enabled and seems because of this I pay for more nodes than I should. I assume that kubectl top pod
produced same data which is used by cloud to allocated nodes on demand.
I assume that kubectl top pod produced same data which is used by cloud to allocated nodes on demand.
I think https://agones.dev/site/docs/advanced/limiting-resources/ will be relevent to this - this covers how to set compute resources requests and limits, as well as showing you how K8s uses the requests and limits to specify scheduling. Burst resource usage has no bearing on scheduling.
To note: On Linux, the default CPU request is 100m (from memory). I don't know if it is the same on Windows. It's generally a good idea to set this value so you have control over scheduling.
Running the example on Linux, I get the following output (which does have limits and request set):
root@7a0ef91906d8:/go/src/agones.dev/agones# kubectl top pods
NAME CPU(cores) MEMORY(bytes)
simple-game-server-5qt7z-jk7lw 22m 13Mi
simple-game-server-5qt7z-z2t8s 22m 13Mi
root@7a0ef91906d8:/go/src/agones.dev/agones#
Which seems about inline with the 20m request/limit combo set.
@markmandel It seems that you use for{}
which keeps the loop busy. Can we replace this with select{}
which causes a blocking operation instead or was there a reason it was done this way?
A cancelable context would probably be the best bet. Then rather than for {}
, have <- ctx.Done()
at the end of that function.
Seems like a good improvement 👍🏻
This should already be fixed by #3050 , closing.