agones icon indicating copy to clipboard operation
agones copied to clipboard

simple game servers eats too much cpu

Open dzmitry-lahoda opened this issue 2 years ago • 6 comments

What happened:

Simple game server eats too much CPU while doing nothing

What you expected to happen:

Unrestricted deployment does not use CPU until I call it intensively via netcat.

How to reproduce it (as minimally and precisely as possible):

Deploy simple gameserver on Windows without restricted fleet. It eats whole core, while industrial grade full blown real GS eats only 1/4

Anything else we need to know?:

Debug is simple game server 0.3 from gcr

debug-dev-mw48w-48jxr                                     891m         117Mi
debug-dev-mw48w-lm6tq                                     884m         118Mi

Simple GS running on Windows

Environment:

  • Agones version: 1.15
  • Kubernetes version (use kubectl version): 1.18
  • Cloud provider or hardware configuration: Azure
  • Install method (yaml/helm): yaml
  • Troubleshooting guide log(s):
  • Others:

dzmitry-lahoda avatar Aug 02 '21 10:08 dzmitry-lahoda

Hi @dzmitry-lahoda - thanks for filing this bug report. Can you provide a bit more information / clarification for me?

  • What do you mean by "unrestricted deployment" and "restricted fleet"?
  • Are you running with udp only (the default), tcp only, or both?
  • Are you deploying the game server in a fleet with the example yaml file or are you using a different config?
  • What version of Windows are you running on the node?

Thanks!

roberthbailey avatar Aug 03 '21 17:08 roberthbailey

I did not set CPU limit to gameserver template. Will set it as fix - restrict.

apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
  name: debug
spec:  
  strategy:
    type: Recreate
  template:
    spec:
      ports:
      - name: default
        portPolicy: Dynamic
        containerPort: 7654
      health:
        initialDelaySeconds: 30
        periodSeconds: 60
      template:
        spec:
          containers: 
          - name: debug
            image: debug
          nodeSelector:
            kubernetes.io/os: windows    
images:
- name: debug
  newName: gcr.io/agones-images/simple-game-server # {"$imagepolicy": "flux-system:debug-image-policy:name"}
  newTag: "0.3" # {"$imagepolicy": "flux-system:debug-image-policy:tag"}

I run above kustomized template. Issue reproduced after restart - no calls were made to any port. I assume default is UDP.

I do not limit, as in example

 limits:
            memory: "64Mi"
            cpu: "20m

So I assume that is fix, bit still sample could yield CPU somehow. For now it probably loops cycles (nice test so - to eat full core).

Operating system
:
Windows
Size
:
Standard_D2_v2 (2 instances)
AKS Base Windows Image (17763.1697.210129)

dzmitry-lahoda avatar Aug 03 '21 18:08 dzmitry-lahoda

So seems fix is to limit, so may be can make the server not to be CPU hungry as option.

dzmitry-lahoda avatar Aug 03 '21 18:08 dzmitry-lahoda

Follow up question, what does "eats too much CPU while doing nothing" specifically mean?

Do you mean how much is allocated to the process by the scheduler? Or do you mean you are looking at CPU graphs on the node and seeing spikes in usage?

markmandel avatar Aug 06 '21 21:08 markmandel

I have real heavy gameserver with no players and simple game server. Both never where called by UDP or TCP. Real gameserver consumes 200m CPU. While simple game server consumes 990m (one core) of CPU. As it is shown by kubectl top pod. I have autoscaler enabled and seems because of this I pay for more nodes than I should. I assume that kubectl top pod produced same data which is used by cloud to allocated nodes on demand.

dzmitry-lahoda avatar Aug 07 '21 06:08 dzmitry-lahoda

I assume that kubectl top pod produced same data which is used by cloud to allocated nodes on demand.

I think https://agones.dev/site/docs/advanced/limiting-resources/ will be relevent to this - this covers how to set compute resources requests and limits, as well as showing you how K8s uses the requests and limits to specify scheduling. Burst resource usage has no bearing on scheduling.

To note: On Linux, the default CPU request is 100m (from memory). I don't know if it is the same on Windows. It's generally a good idea to set this value so you have control over scheduling.

Running the example on Linux, I get the following output (which does have limits and request set):

root@7a0ef91906d8:/go/src/agones.dev/agones# kubectl top pods
NAME                             CPU(cores)   MEMORY(bytes)
simple-game-server-5qt7z-jk7lw   22m          13Mi
simple-game-server-5qt7z-z2t8s   22m          13Mi
root@7a0ef91906d8:/go/src/agones.dev/agones#

Which seems about inline with the 20m request/limit combo set.

markmandel avatar Aug 09 '21 19:08 markmandel

@markmandel It seems that you use for{} which keeps the loop busy. Can we replace this with select{} which causes a blocking operation instead or was there a reason it was done this way?

icedream avatar Feb 14 '23 10:02 icedream

A cancelable context would probably be the best bet. Then rather than for {}, have <- ctx.Done() at the end of that function.

Seems like a good improvement 👍🏻

markmandel avatar Feb 14 '23 18:02 markmandel

This should already be fixed by #3050 , closing.

gongmax avatar Apr 26 '23 21:04 gongmax