agones icon indicating copy to clipboard operation
agones copied to clipboard

[Feature Proposal] - SDK Support In-Place Pod Resize

Open markmandel opened this issue 7 months ago • 6 comments

Is your feature request related to a problem? Please describe. Be able to use the Agones SDK to facilitate in-place pod resizing.

See: https://kubernetes.io/blog/2025/05/16/kubernetes-v1-33-in-place-pod-resize-beta/

Describe the solution you'd like

Haven't worked out the specifics yet (this is a bit of a placeholder) - but some way to use the SDK to manipulate in place resources for the backing pod.

Describe alternatives you've considered

Force people to do this through a hand written controller.

Additional context

People have wanted this since the beginning on Agones.

Link to the Agones Feature Proposal (if any) None so far.

Discussion Link (if any)

N/A

markmandel avatar May 26 '25 02:05 markmandel

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

github-actions[bot] avatar Jul 01 '25 10:07 github-actions[bot]

We should add this functionality. Adding awaiting-maintainer label.

igooch avatar Jul 01 '25 20:07 igooch

It's fine - people have been asking me about this since 2018 😁

markmandel avatar Jul 01 '25 23:07 markmandel

I don't have time to work on this at the moment and nothing below has been tested, but adding some thoughts while they're fresh:

SDK Changes

A new gRPC endpoint like Resize(ResizeRequest) returns (Empty)

  rpc Resize(ResizeRequest) returns (google.protobuf.Empty) {
    option (google.api.http) = {
       patch: "/v1alpha1/gameserver/resize"
       body: "*"
     };
  }
}

// Represents a set of compute resource requirements that can be requested.
// All fields are optional. Only fields that are set in a request will be considered for an update.
message ResizableResources {
  // The desired CPU, in Kubernetes resource quantity format (e.g. "1000m", "1.5").
  String cpu = 1;
  // The desired Memory, in Kubernetes resource quantity format (e.g. "512Mi", "2Gi").
  String memory = 2;
}

// A request to resize the running game server's Pod.
// If this message or its fields are not set, the current requests and limits are not changed.
message ResizeRequest {
  ResizableResources requests = 1;
  ResizableResources limits = 2;
}

The sidecar will receive the Resize gRPC call and will be responsible for communicating the resize request to the Agones control plane.

API Changes

We could potentially mutate the game server spec after an update has been confirmed. Below proposes instead to add new fields to GameServer.Status instead similar to how we currently handle other changes to the GameServer at runtime.

# In GameServerStatus
resize:
  # Requested resource values from the SDK. The controller will process this.
  # [Optionally] Cleared by the controller after processing.
  requests:
    cpu: "1500m"
    memory: "1.5Gi"
  limits:
    cpu: "2000m"
    memory: "2Gi"
# [Optionally] New field in the container status to reflect the *actual* allocated resources
# This is populated by the controller after observing the Pod status.
allocatedResources:
  requests:
    cpu: "1500m"
    memory: "1.5Gi"
  limits:
    cpu: "2000m"
    memory: "2Gi"

Allowing a game server to request arbitrary resources poses a risk. This will be mitigated by introducing a resize policy at the Fleet level, similar to Buffer settings.

# In Fleet.Spec.Template.Spec
policy:
  resize:
    # Defines the min/max resources a GameServer can request.
    min:
      requests:
        cpu: "500m"
    max:
      requests:
        cpu: "4000m"

Agones Controller Changes

The GameServer controller will contain the core logic for applying the resize operation.

  • Reuse the existing logic for syncing the pod based on client sdk changes https://github.com/googleforgames/agones/blob/da1e92896ec6fe6c056a1ede694f44b3266e4feb/pkg/gameservers/controller.go#L451-L453
  • Validate any incoming request against a ResourcePolicy defined in the Fleet to prevent game servers from requesting excessive resources.
  • If the request is valid, the controller will patch the spec.containers[].resources of the underlying Pod object associated with the GameServer.
  • The controller should monitor the Pod's status. Kubernetes updates the pod.status.resize field to indicate the status of the operation (InProgress, Feasible, Infeasible).
  • Once the resize is complete (either successfully or not), the controller will update the GameServer.Status.allocatedResources with the new, actual resource values from the Pod and clear the status.resize request fields.

igooch avatar Jul 16 '25 06:07 igooch

That could be pretty handy for the counters and lists / reusing gameservers (sessions) ! I'll play with it, I've got another task in progress, but will jump on this task just after

lacroixthomas avatar Aug 16 '25 23:08 lacroixthomas

I'm currently implementing a POC to see if there could be things missing and there might be one:

We should probably also keep track of which container to resize, if they have multiple ones, they might want to resize one specifically and not all of them

SDK Changes:

message ResizeRequest {
  string Container = 1;
  ResizableResources requests = 2;
  ResizableResources limits = 3;
}

API Changes:

# In GameServerStatus
resize:
  game-server:
    requests:
      cpu: "1500m"
      memory: "1.5Gi"
    limits:
      cpu: "2000m"
      memory: "2Gi"
  logger-sidecar:
    requests:
      cpu: "100m"
      memory: "128Mi"
    limits:
      cpu: "200m"
      memory: "256Mi"

allocatedResources:
  game-server:
    requests:
      cpu: "1500m"
      memory: "1.5Gi"
    limits:
      cpu: "2000m"
      memory: "2Gi"
  logger-sidecar:
    requests:
      cpu: "100m"
      memory: "128Mi"
    limits:
      cpu: "200m"
      memory: "256Mi"
spec:
  template:
    spec:
      policy:
        resize:
          min:
            game-server:
              requests:
                cpu: "500m"
                memory: "256Mi"
              limits:
                cpu: "1000m"
                memory: "512Mi"
            logger-sidecar:
              requests:
                cpu: "50m"
                memory: "64Mi"
              limits:
                cpu: "100m"
                memory: "128Mi"
          max:
            game-server:
              requests:
                cpu: "4000m"
                memory: "8Gi"
              limits:
                cpu: "8000m"
                memory: "16Gi"
            logger-sidecar:
              requests:
                cpu: "500m"
                memory: "512Mi"
              limits:
                cpu: "1000m"
                memory: "1Gi"

@markmandel @igooch What do you think about it ?

lacroixthomas avatar Sep 28 '25 21:09 lacroixthomas

Just moved to stable: https://kubernetes.io/blog/2025/12/17/kubernetes-v1-35-release/#stable-in-place-update-of-pod-resources

markmandel avatar Dec 18 '25 05:12 markmandel