fiaas-deploy-daemon icon indicating copy to clipboard operation
fiaas-deploy-daemon copied to clipboard

Use API v2 for HPA creation

Open apankratov opened this issue 4 years ago • 1 comments

FDDEP-0010: Switch API version for HPA object creation to v2beta2

Summary

Currently when HPA object is created - it uses API v1 that supports only very limited set of metrics that can be used to scale the deployment (pod CPU utilization and memory). We need to switch API version to v2 to allow users to use extended set of metrics (e.g. provided by prometheus) for deployment autoscaling.

Motivation

  1. This change will allow to extend PAAS spec so users can scale their deployments based on relevant metrics.
  2. It will solve the "sidecar problem" (i.e. for HPA v1 CPU utilization is an average across all containers within pod that may lead to improper scaling in case of sidecars that may lower the average pod CPU utilization value).

Goals

The goal is just switch HPA to v2 that will allow:

  • maintain backward compatibility, i.e. current PAAS spec for replicas will work exactly as before in sense of scaling logic
  • add possibility to scale on various resource metrics
  • allow extending FIAAS ecosystem with scaling based on business metrics

Non-Goals

  • add support for custom metrics
  • update PAAS spec

Proposal

User stories

Sidecar problem

As a user when I have a pod with sidecar(s) (for example datadog agent) and specify CPU threshold 75% for scaling - if datadog agent CPU usage is 0 while application pod CPU usage is 90 - average will be below 75 and scaling up won't happen.

Business metrics

As a user I want to scale my deployment based on relevant business metric (e.g. requests rate). With HPA v1 I just can't specify this metric.

⚠️ Caveats ⚠️

HPA API v2 was introduced in k8s version 1.9+, so for older cluster versions backward compatibility in case of just switch to API v2 will be broken.

Mitigations

There are several possible options of how to mitigate the caveat listed above:

  1. We can just state in release notes that new version includes breaking changes and older k8s versions is no longer supported.
  2. We can "branch" the development and make tagged releases
  3. We can not just switch to API v2, but maintain backward compatibility by adding some cluster version detection feature that will use API v1 for older clusters and v2 for newer ones.

Design details

💡 Ideally we should probably switch to using official k8s client library instead of fiaas onne, but it will require too much refactoring that is out of the scope of this proposal.

  1. We need to add v2 models in fiaas/k8s of autoscaling v2 objects
  2. We need to update FDD autoscaler deployer to use new models
  3. We need to support current paas.yaml spec so we need to add transformation of current metrics specification to v2 format, i.e. targetCPUUtilizationPercentage: XX should be transformed to v2 metrics spec:
metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: XX

Dependencies

  • https://github.com/fiaas/k8s

Implementation history

  • 2020-10-13: Proposal open for discussion

apankratov avatar Oct 13 '20 08:10 apankratov

We are struggling with the exact same thing. Too bad I didn't see this issue before now. Been trying to use the extension-hook-url to change the API version before it's deployed but it skips doing that for obvious reasons when I looked at the source code.

arealmaas avatar Jul 06 '22 13:07 arealmaas