charts icon indicating copy to clipboard operation
charts copied to clipboard

Adds HorizontalPodAutoscaler resource.

Open echoboomer opened this issue 3 years ago • 9 comments

What this PR does / why we need it: Adds an option for a HorizontalPodAutoscaler object to the Buildkite Agent Helm chart. This provides the ability to specify a dictionary called types underneath the values heading horizontalPodAutoscaler which can optionally be enabled. It is disabled by default.

Out of the box, you can specify cpu, memory, or custom inside this dictionary like this:

horizontalPodAutoscaler:
  enabled: true
  maxReplicas: 50
  minReplicas: 10
  types:
    cpu:
      type: Utilization # or AverageValue
      target: 80
    memory:
      type: AverageValue # or Utilization
      target: 80
    custom:
      metricName: requests-per-second
      type: AverageValue # or Value
      target: 2k

This is documented in the readme in an additional section.

I also added the settings for podDisruptionBudget to the readme because they appeared to be missing.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer: This is likely less useful for cpu or memory scaling, but more useful for custom metrics scaling for something like Prometheus. This does not factor in the steps required to expose these metrics within the cluster - it only enables the chart to take advantage of them.

This may need additional testing as I am limited in test environments by the types of external or custom metrics I can use.

Checklist

[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]

  • [x] Chart Version bumped
  • [x] Variables are documented in the README.md

echoboomer avatar Sep 24 '20 18:09 echoboomer

Also of note, the horizontal-pod-autoscaler.yaml manifest will not render if types: {} is left at the default value of an empty dict.

echoboomer avatar Sep 24 '20 18:09 echoboomer

thanks for the PR, I cannot test it myself, maybe @toolmantim or @0x0I can validate it.

rimusz avatar Sep 24 '20 18:09 rimusz

@toolmantim @0x0I any chance on this getting looked at soon? (This is of significant interest to our Buildkite Infrastructure, as i'd like to move all agents into our Kube Cluster)

jack1902 avatar Nov 13 '20 09:11 jack1902

Hey got it and sure @jack1902 - will try to take a look today or at least over the weekend.

O1ahmad avatar Nov 13 '20 11:11 O1ahmad

@0x0I I've left some comments as i am hoping to move multiple queues into a Kube Cluster via some already existing metrics outside the Cluster within AWS. I have a diagram of the idea that i have which i am happy to share here, as i feel it might be a common pattern followed for numerous Buildkite Agent queues.

Buildkite Kube(1)

jack1902 avatar Nov 13 '20 12:11 jack1902

@jack1902 @0x0I I've taken a different approach here based on feedback. This is a bit more simple and leaves it up to the user to decide exactly what is passed in for the metrics section. It will only render if metrics are actually supplied, and will alert the user if max/min replica counts are not provided. This is documented in the readme. It also lets you override the apiVersion for the hpa since that is almost entirely a requirement given the spread of use cases.

This approach is a bit more open-ended and isn't as restrictive, and enables someone to do whatever they want with the settings.

echoboomer avatar Dec 02 '20 01:12 echoboomer

this is great.

ianks avatar Dec 18 '20 21:12 ianks

@jack1902 in your diagram you have worker node group, but isn't Horizontal Pod Autoscaler was designed to scale pods?

hi-artem avatar Oct 13 '21 00:10 hi-artem

The agents ran on AWS instances. In EKS, if you consume all of your nodes, you need the cluster autoscaler to scale up your cluster to have additional nodes.

The line between the HPA and cluster autoscaler was there to signify that they work together in order to scale up the number of agents.

I never had the chance to implement this, but I had started to look into running buildkite jobs as kubernetes jobs.

This meant agents only lived as long as job

jack1902 avatar Oct 13 '21 06:10 jack1902