agones end to end test: Webhook autoscaler to with fleet metadata

Continuation of https://github.com/googleforgames/agones/issues/3951

It would be good to have a e2e test for the webhook that looks and passes through metadata.

Feb 24 '25 21:02 markmandel

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

Apr 15 '25 10:04 github-actions[bot]

While using FeatureFleetAutoscaleRequestMetaData in a webhook policy, we rely on some specific logic to extract values from the incoming request: examples/autoscaler-webhook

https://github.com/googleforgames/agones/blob/1865ad20052e916beafc8c8ecdec683722fa083d/pkg/fleetautoscalers/fleetautoscalers.go#L190-L193

This logic is not currently present in the example/autoscaler-webhook implementation. In unit tests, we manually override the values: https://github.com/googleforgames/agones/blob/1865ad20052e916beafc8c8ecdec683722fa083d/examples/autoscaler-webhook/main.go#L154-L166

https://github.com/googleforgames/agones/blob/1865ad20052e916beafc8c8ecdec683722fa083d/pkg/fleetautoscalers/fleetautoscalers_test.go#L97-L104

But for e2e tests, we use the real example/autoscaler-webhook image: https://github.com/googleforgames/agones/blob/1865ad20052e916beafc8c8ecdec683722fa083d/test/e2e/fleetautoscaler_test.go#L663-L666

I suggest we add this metadata-handling logic to the example/autoscaler-webhook code and build a new version of the image. This will allow us to test FeatureFleetAutoscaleRequestMetaData behavior properly in e2e tests using the actual image.

Looking forward to your thoughts.

cc: @markmandel @igooch

Jul 08 '25 09:07 0xaravindh

@swermin If you have any relevant code snippets for example/autoscaler-webhook, please share them here as a comment. It would be helpful for referencing or incorporating them while updating the webhook logic.

Jul 15 '25 07:07 0xaravindh

@0xaravindh are you asking about incorporating something like the below (untested) code into the example?

func handleAutoscale(w http.ResponseWriter, r *http.Request) {
        // existing code

        if runtime.FeatureEnabled(runtime.FeatureFleetAutoscaleRequestMetaData) {
                if faReq.Request.Annotations != nil {
                        if value, ok := faReq.Request.Annotations["fixedReplicas"]; ok {
                                replicas, err := strconv.Atoi(value)
                                if err != nil {
                                        http.Error(w, err.Error(), http.StatusInternalServerError)
                                        return
                                }
                                faResp.Scale = true
                                faResp.Replicas = int32(replicas)
                        }
                }
        }

        if !faResp.Scale && faReq.Request.Status.Replicas != 0 {
                // existing scaling logic
        }

@swermin if you have a simple working example or code snippets that would be helpful in making sure this is properly tested.

Jul 16 '25 06:07 igooch

Thanks @igooch — yes, I implemented a similar logic snippet and built a custom image to test it in e2e. The tests passed successfully with the metadata-handling in place.

However, the actual example/autoscaler-webhook includes some existing logic based on thresholds and scale factors for upscaling/downscaling. So before replacing or overriding that logic, I wanted to confirm the expected behaviour

Also, I think we should add two env variables in example/autoscaler-webhook to help with this feature toggle and metadata support. If the FleetAutoscaleRequestMetaData feature is enabled, we can use the new metadata logic. Otherwise, we can stick to the existing scaling logic. WDYT @igooch @swermin

Just wanted to confirm this approach before making changes to the example webhook.

Once that’s clear, I can help update the webhook code and build the new image.

Jul 16 '25 07:07 0xaravindh

Hi @0xaravindh, I've been on vacation without access to a computer so its taken a bit of time for me to respond with an example.

So, for our case, we use the annotations to decide if the fleet has a TTL set or not, and use that to set the fleet size. So this is the snippet we are using at least:

const (
	DEFAULT_TTL_ENABLED  = false
	DEFAULT_TTL_SECONDS  = 60
	DEFAULT_BUFFER_TYPE  = "Percentage"
	DEFAULT_STEP         = 10
	DEFAULT_MIN_REPLICAS = 20
	DEFAULT_MAX_REPLICAS = 100
)

// getFleetSize retrieves the size of the fleet based on the FleetAutoscaleRequest.
func getFleetSize(req *autoscalingv1.FleetAutoscaleRequest) autoscalingv1.FleetAutoscaleResponse {
	fleet := fleetCache.get(req.Labels)
	if fleet == nil {
		logrus.WithField("fleet", req.Labels).Debug("Fleet not found in cache, returning default response")
		return autoscalingv1.FleetAutoscaleResponse{
			Scale:    false,
			Replicas: 0,
			UID:      req.UID,
		}
	}

	ttlEnabled := DEFAULT_TTL_ENABLED
	if value, ok := req.Annotations["autoscaling.ttl_enabled"]; ok {
		if parsed, err := strconv.ParseBool(value); err == nil {
			ttlEnabled = parsed
		}
	}

	if ttlEnabled {
		ttlSeconds := DEFAULT_TTL_SECONDS
		if value, ok := req.Annotations["autoscaling.ttl_seconds"]; ok {
			if parsed, err := strconv.Atoi(value); err == nil {
				ttlSeconds = parsed
			}
		}
		ttlExpiration := time.Duration(ttlSeconds) * time.Second

		if fleet.lastUpdatedTime.Add(ttlExpiration).Before(time.Now()) {
			// Fleet is expired, set the size to 0
			return autoscalingv1.FleetAutoscaleResponse{
				Scale:    false,
				Replicas: 0,
				UID:      req.UID,
			}
		}

		bufferType := DEFAULT_BUFFER_TYPE
		if value, ok := req.Annotations["autoscaling.buffer_type"]; ok {
			bufferType = value
		}

		size := 0
		switch bufferType {
			case "Percentage":
				size = percentageScale(req.Annotations)
			case "Fixed":
				size = fixedScale(req.Annotations)
			default:
				logrus.WithField("bufferType", bufferType).Error("Unknown buffer type")
				size = 0
		}
	}

	return autoscalingv1.FleetAutoscaleResponse{
		Scale:    req.Status.Replicas != size,
		Replicas: size,
		UID:      req.UID,
	}
}

As for how to add a good e2e test; I would do exactly as you described there. Having the annotations drive the logic if it is enabled, otherwise use the old way. (Which is what we are doing in our case)

Jul 16 '25 14:07 swermin