flagger
flagger copied to clipboard
Prevent primary hpa collision for keda scaled objects when migrating from an hpa
This MR fixes https://github.com/fluxcd/flagger/issues/1646 where Flagger fails to create the primary hpa for a keda scaled object when you are migrating from a regular hpa to a scaled object in a canary.
Using the scaledobject.keda.sh/transfer-hpa-ownership: "true"
annotation, a keda scaled object can take ownership of an already existing HPA. Currently, annotations are not copied over from the canary scaled object to the primary scaled object, so I am adding this functionality.
Additionally, we also need to add the Advanced.HorizontalPodAutoscalerConfig.Name
field to the ScaledObjectSpec
so that we can specify the HPA name that the scaled object will take ownership of. When creating the primary scaled object from the canary, we use the same hpa name but append -primary
to the end of it.
Testing
With the above changes, we can then successfully use the transfer hpa ownership annotation and specify the HPA that the scaled objects will take ownership of. This prevents KEDA from failing to create the new primary hpa since it already exists. Instead it just uses and modifies the already existing one.
Applying the following diff to test in a kubernetes cluster:
fortio, fortio-server-deployment-2, Canary (flagger.app) has changed:
# Source: fortio/templates/fortio.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: fortio-server-deployment-2
namespace: fortio
spec:
analysis:
interval: 30s
maxWeight: 50
metrics:
- interval: 30s
name: request-success-rate
thresholdRange:
min: 99
stepWeight: 10
threshold: 10
webhooks:
- metadata:
async: "on"
c: "6"
labels: fortio-server-deployment-2
load: Start
nocatchup: "on"
qps: "12"
save: "on"
stdclient: "on"
t: 600s
uniform: "on"
url: http://fortio-server-deployment-2-canary.fortio.svc.cluster.local:8080/
name: generateLoad
timeout: 60s
type: pre-rollout
url: http://fortio-client.fortio.svc.cluster.local:8080/fortio/rest/run?jsonPath=.metadata
progressDeadlineSeconds: 600
autoscalerRef:
- apiVersion: autoscaling/v2
- kind: HorizontalPodAutoscaler
+ apiVersion: keda.sh/v1alpha1
+ kind: ScaledObject
name: fortio-server-deployment-2
targetRef:
apiVersion: apps/v1
kind: Deployment
name: fortio-server-deployment-2
service:
port: 8080
targetPort: 8080
fortio, fortio-server-deployment-2, HorizontalPodAutoscaler (autoscaling) has been removed:
- # Source: fortio/templates/fortio.yaml
- apiVersion: autoscaling/v2
- kind: HorizontalPodAutoscaler
- metadata:
- name: fortio-server-deployment-2
- namespace: fortio
- spec:
- maxReplicas: 4
- metrics:
- - resource:
- name: cpu
- target:
- averageUtilization: 99
- type: Utilization
- type: Resource
- minReplicas: 2
- scaleTargetRef:
- apiVersion: apps/v1
- kind: Deployment
- name: fortio-server-deployment-2
+
fortio, fortio-server-deployment-2, ScaledObject (keda.sh) has been added:
-
+ # Source: fortio/templates/fortio.yaml
+ apiVersion: keda.sh/v1alpha1
+ kind: ScaledObject
+ metadata:
+ name: fortio-server-deployment-2
+ namespace: fortio
+ annotations:
+ scaledobject.keda.sh/transfer-hpa-ownership: "true"
+ spec:
+ advanced:
+ horizontalPodAutoscalerConfig:
+ name: fortio-server-deployment-2
+ scaleTargetRef:
+ name: fortio-server-deployment-2
+ minReplicaCount: 2
+ maxReplicaCount: 4
+ triggers:
+ - type: prometheus
+ metadata:
+ serverAddress: http://kube-prometheus-stack-prometheus:9090/prometheus
+ threshold: '100'
+ query: sum(rate(istio_requests_total{destination_service_name="fortio-server-2"}[1m]))
+ - type: cpu
+ metricType: Utilization
+ metadata:
+ value: "99"
See that the primary scaled object successfully takes over ownership of the primary hpa and it no longer fails.
(⎈|arn:aws:eks:us-east-2:824163854805:cluster/rokumesh-auto-jgeisler-t7:fortio)jgeisler@PK4V290RH9 platform % k get so
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK PAUSED AGE
fortio-server-deployment-2 apps/v1.Deployment fortio-server-deployment-2 2 4 prometheus True Unknown False True 170m
fortio-server-deployment-2-primary apps/v1.Deployment fortio-server-deployment-2-primary 2 4 prometheus True False False Unknown 170m
(⎈|arn:aws:eks:us-east-2:824163854805:cluster/rokumesh-auto-jgeisler-t7:fortio)jgeisler@PK4V290RH9 platform % k get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
fortio-server-deployment-2-primary Deployment/fortio-server-deployment-2-primary 0/100 (avg), 4%/99% 2 4 2 13d