end to end test: Webhook autoscaler to with fleet metadata
Continuation of https://github.com/googleforgames/agones/issues/3951
It would be good to have a e2e test for the webhook that looks and passes through metadata.
'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '
While using FeatureFleetAutoscaleRequestMetaData in a webhook policy, we rely on some specific logic to extract values from the incoming request: examples/autoscaler-webhook
https://github.com/googleforgames/agones/blob/1865ad20052e916beafc8c8ecdec683722fa083d/pkg/fleetautoscalers/fleetautoscalers.go#L190-L193
This logic is not currently present in the example/autoscaler-webhook implementation. In unit tests, we manually override the values: https://github.com/googleforgames/agones/blob/1865ad20052e916beafc8c8ecdec683722fa083d/examples/autoscaler-webhook/main.go#L154-L166
https://github.com/googleforgames/agones/blob/1865ad20052e916beafc8c8ecdec683722fa083d/pkg/fleetautoscalers/fleetautoscalers_test.go#L97-L104
But for e2e tests, we use the real example/autoscaler-webhook image: https://github.com/googleforgames/agones/blob/1865ad20052e916beafc8c8ecdec683722fa083d/test/e2e/fleetautoscaler_test.go#L663-L666
I suggest we add this metadata-handling logic to the example/autoscaler-webhook code and build a new version of the image. This will allow us to test FeatureFleetAutoscaleRequestMetaData behavior properly in e2e tests using the actual image.
Looking forward to your thoughts.
cc: @markmandel @igooch
@swermin If you have any relevant code snippets for example/autoscaler-webhook, please share them here as a comment. It would be helpful for referencing or incorporating them while updating the webhook logic.
@0xaravindh are you asking about incorporating something like the below (untested) code into the example?
func handleAutoscale(w http.ResponseWriter, r *http.Request) {
// existing code
if runtime.FeatureEnabled(runtime.FeatureFleetAutoscaleRequestMetaData) {
if faReq.Request.Annotations != nil {
if value, ok := faReq.Request.Annotations["fixedReplicas"]; ok {
replicas, err := strconv.Atoi(value)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
faResp.Scale = true
faResp.Replicas = int32(replicas)
}
}
}
if !faResp.Scale && faReq.Request.Status.Replicas != 0 {
// existing scaling logic
}
@swermin if you have a simple working example or code snippets that would be helpful in making sure this is properly tested.
Thanks @igooch — yes, I implemented a similar logic snippet and built a custom image to test it in e2e. The tests passed successfully with the metadata-handling in place.
However, the actual example/autoscaler-webhook includes some existing logic based on thresholds and scale factors for upscaling/downscaling. So before replacing or overriding that logic, I wanted to confirm the expected behaviour
Also, I think we should add two env variables in example/autoscaler-webhook to help with this feature toggle and metadata support. If the FleetAutoscaleRequestMetaData feature is enabled, we can use the new metadata logic. Otherwise, we can stick to the existing scaling logic. WDYT @igooch @swermin
Just wanted to confirm this approach before making changes to the example webhook.
Once that’s clear, I can help update the webhook code and build the new image.
Hi @0xaravindh, I've been on vacation without access to a computer so its taken a bit of time for me to respond with an example.
So, for our case, we use the annotations to decide if the fleet has a TTL set or not, and use that to set the fleet size. So this is the snippet we are using at least:
const (
DEFAULT_TTL_ENABLED = false
DEFAULT_TTL_SECONDS = 60
DEFAULT_BUFFER_TYPE = "Percentage"
DEFAULT_STEP = 10
DEFAULT_MIN_REPLICAS = 20
DEFAULT_MAX_REPLICAS = 100
)
// getFleetSize retrieves the size of the fleet based on the FleetAutoscaleRequest.
func getFleetSize(req *autoscalingv1.FleetAutoscaleRequest) autoscalingv1.FleetAutoscaleResponse {
fleet := fleetCache.get(req.Labels)
if fleet == nil {
logrus.WithField("fleet", req.Labels).Debug("Fleet not found in cache, returning default response")
return autoscalingv1.FleetAutoscaleResponse{
Scale: false,
Replicas: 0,
UID: req.UID,
}
}
ttlEnabled := DEFAULT_TTL_ENABLED
if value, ok := req.Annotations["autoscaling.ttl_enabled"]; ok {
if parsed, err := strconv.ParseBool(value); err == nil {
ttlEnabled = parsed
}
}
if ttlEnabled {
ttlSeconds := DEFAULT_TTL_SECONDS
if value, ok := req.Annotations["autoscaling.ttl_seconds"]; ok {
if parsed, err := strconv.Atoi(value); err == nil {
ttlSeconds = parsed
}
}
ttlExpiration := time.Duration(ttlSeconds) * time.Second
if fleet.lastUpdatedTime.Add(ttlExpiration).Before(time.Now()) {
// Fleet is expired, set the size to 0
return autoscalingv1.FleetAutoscaleResponse{
Scale: false,
Replicas: 0,
UID: req.UID,
}
}
bufferType := DEFAULT_BUFFER_TYPE
if value, ok := req.Annotations["autoscaling.buffer_type"]; ok {
bufferType = value
}
size := 0
switch bufferType {
case "Percentage":
size = percentageScale(req.Annotations)
case "Fixed":
size = fixedScale(req.Annotations)
default:
logrus.WithField("bufferType", bufferType).Error("Unknown buffer type")
size = 0
}
}
return autoscalingv1.FleetAutoscaleResponse{
Scale: req.Status.Replicas != size,
Replicas: size,
UID: req.UID,
}
}
As for how to add a good e2e test; I would do exactly as you described there. Having the annotations drive the logic if it is enabled, otherwise use the old way. (Which is what we are doing in our case)