krr [Feature] Add support Horizontal Pod Autoscaling

Hello! Please add support Horizontal Pod Autoscaling I view current output:

│ 1.0 -> ? (No data)     │ 1.0 -> ? (No data)      │             │ 100Mi -> ? (No data)   │ 2000Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 100m -> ? (No data)    │ 100m -> ? (No data)     │             │ 100Mi -> ? (No data)   │ 100Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 100m -> ? (No data)    │ 100m -> ? (No data)     │             │ 256Mi -> ? (No data)   │ 256Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 1.0 -> ? (No data)     │ 1.0 -> ? (No data)      │             │ 256Mi -> ? (No data)   │ 256Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 1.0 -> ? (No data)     │ 1.0 -> ? (No data)      │             │ 100Mi -> ? (No data)   │ 2000Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 1.0 -> ? (No data)     │ 1.0 -> ? (No data)      │             │ 256Mi -> ? (No data)   │ 256Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 100m -> ? (No data)    │ 100m -> ? (No data)     │             │ 100Mi -> ? (No data)   │ 100Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 100m -> ? (No data)    │ 500m -> ? (No data)     │             │ 100Mi -> ? (No data)   │ 500Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 100m -> ? (No data)    │ 100m -> ? (No data)     │             │ 100Mi -> ? (No data)   │ 100Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 100m -> ? (No data)    │ 100m -> ? (No data)     │             │ 100Mi -> ? (No data)   │ 100Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 100m -> ? (No data)    │ 500m -> ? (No data)     │             │ 100Mi -> ? (No data)   │ 500Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 500m -> ? (No data)    │ 1.0 -> ? (No data)      │             │ 500Mi -> ? (No data)   │ 1500Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 500m -> ? (No data)    │ 1.0 -> ? (No data)      │             │ 1000Mi -> ? (No data)  │ 1500Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 100m -> ? (No data)    │ 500m -> ? (No data)     │             │ 100Mi -> ? (No data)   │ 500Mi -> ? (No 
┼────────────────────────┼─────────────────────────┼─────────────┼────────────────────────┼──────────────────
│ 100m -> ? (No data)    │ 500m -> ? (No data)     │             │ 100Mi -> ? (Not enough │ 500Mi -> ? (Not enough

Nov 07 '23 14:11 patsevanton

Hey all, we're still figuring out the appropriate algorithm to use in this case. (The usual logic doesn't work if you're scaling according to CPU/memory utilization.)

To help, we'd love to hear from each of you:

How is your HPA defined? What metric do you use to scale?
What logic would make the most sense for recommendations when using the HPA?

Nov 16 '23 05:11 aantn

I use standart request CPU and request Memory

Nov 17 '23 03:11 patsevanton

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    meta.helm.sh/release-name: application
    meta.helm.sh/release-namespace: application
  creationTimestamp: "2023-07-24T19:28:02Z"
  labels:
    app-name: application
    app-version: release-f469738c
    app.kubernetes.io/managed-by: Helm
  name: application
  namespace: application
  resourceVersion: "249236825"
  uid: 15d78a48-a594-4908-8723-82ba7291f6b0
spec:
  behavior:
    scaleDown:
      policies:
      - periodSeconds: 60
        type: Pods
        value: 4
      - periodSeconds: 60
        type: Percent
        value: 25
      selectPolicy: Max
    scaleUp:
      policies:
      - periodSeconds: 60
        type: Pods
        value: 4
      - periodSeconds: 60
        type: Percent
        value: 25
      selectPolicy: Max
      stabilizationWindowSeconds: 0
  maxReplicas: 25
  metrics:
  - resource:
      name: memory
      target:
        averageUtilization: 80
        type: Utilization
    type: Resource
  - resource:
      name: cpu
      target:
        averageUtilization: 80
        type: Utilization
    type: Resource
  minReplicas: 6
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: application
status:
  conditions:
  - lastTransitionTime: "2023-07-24T19:28:17Z"
    message: recommended size matches current size
    reason: ReadyForNewScale
    status: "True"
    type: AbleToScale
  - lastTransitionTime: "2023-11-16T19:17:09Z"
    message: the HPA was able to successfully calculate a replica count from memory
      resource utilization (percentage of request)
    reason: ValidMetricFound
    status: "True"
    type: ScalingActive
  - lastTransitionTime: "2023-11-16T18:59:50Z"
    message: the desired replica count is less than the minimum replica count
    reason: TooFewReplicas
    status: "True"
    type: ScalingLimited
  currentMetrics:
  - resource:
      current:
        averageUtilization: 59
        averageValue: 457606485333m
      name: memory
    type: Resource
  - resource:
      current:
        averageUtilization: 18
        averageValue: 202m
      name: cpu
    type: Resource
  currentReplicas: 6
  desiredReplicas: 6
  lastScaleTime: "2023-11-16T18:04:52Z"

Nov 17 '23 03:11 patsevanton

Thanks. In a case like this, we can't just use the standard KRR strategy, because it leads to unstable recommendations. E.g. if the HPA sets target.averageUtilization: 80 for CPU, but KRR is outputing recommendations for CPU at the 99th percentile, then KRR's recommendations can potentially cause HPA to always scale and HPA's behaviour can cause KRR's recommendations to then change.

We likely need a strategy that takes both into account in the first place.

Nov 19 '23 10:11 aantn

Hi, instead of providing recommendations for deployment, where HPA is enabled is it able to provide recommendations based on each pod? Almost every pod in HPA is getting same memory and CPU utilization

Feb 23 '24 05:02 sksaranraj

Hello @aantn, Is there any plan to add support for HPA resources?

Mar 05 '24 12:03 dgdevops

Yep, the tricky part is - as always - what the algorithm should be.

We have some ideas for the algorithm, but for starters we're going to add an --allow-hpa flag so that you can always say on the current algorithm "give me a recommendation even though the pod uses the HPA, it's OK with me".

And if you have inputs on what an HPA-optimized algorithm looks like, I want to chat!

Mar 05 '24 12:03 aantn

Thank you for your prompt response, having the flag is a great start in my opinion.

Mar 05 '24 12:03 dgdevops

Of course. We'll have a PR fairly soon.

Mar 05 '24 12:03 aantn

Great stuff @aantn, what are the plans & timeline for releasing it?

Mar 06 '24 11:03 dgdevops

Can you try --allow_hpa on #226?

Mar 07 '24 17:03 aantn

Hello @aantn, I have tested the --allow_hpa flag and now I can see CPU & Memory recommendations for the workloads that we have HPA configured for.

Mar 08 '24 08:03 dgdevops

Excellent. Happy to hear it!

Mar 08 '24 09:03 aantn

Thank you @aantn for the quick implementation

Mar 08 '24 09:03 dgdevops

Which version does the allow_hpa function work with?

Mar 09 '24 04:03 patsevanton

Only on the branch in #226. Until we merge and do a release, you'll have to check it out locally and run from source, according to the instructions in the README.

Mar 09 '24 04:03 aantn

Only on the branch in #226. Until we merge and do a release, you'll have to check it out locally and run from source, according to the instructions in the README.

how can I download a binary from this branch without downloading the entire project?

Mar 16 '24 03:03 patsevanton

It's not possible at the moment. You need to checkout the whole project and follow the from source instructions here: https://github.com/robusta-dev/krr?tab=readme-ov-file#installation-methods

Mar 16 '24 05:03 aantn

It's not possible at the moment. You need to checkout the whole project and follow the from source instructions here: https://github.com/robusta-dev/krr?tab=readme-ov-file#installation-methods

I'll wait for the new release.

Mar 16 '24 06:03 patsevanton

May be create alpha release for test hpa?

Apr 22 '24 05:04 patsevanton

This is included in the latest release! Let me know if it works for you.

Apr 22 '24 17:04 aantn

krr krr copied to clipboard

[Feature] Add support Horizontal Pod Autoscaling

krr
krr copied to clipboard