fiaas-deploy-daemon
fiaas-deploy-daemon copied to clipboard
Use API v2 for HPA creation
FDDEP-0010: Switch API version for HPA object creation to v2beta2
Summary
Currently when HPA object is created - it uses API v1 that supports only very limited set of metrics that can be used to scale the deployment (pod CPU utilization and memory). We need to switch API version to v2 to allow users to use extended set of metrics (e.g. provided by prometheus) for deployment autoscaling.
Motivation
- This change will allow to extend PAAS spec so users can scale their deployments based on relevant metrics.
- It will solve the "sidecar problem" (i.e. for HPA v1 CPU utilization is an average across all containers within pod that may lead to improper scaling in case of sidecars that may lower the average pod CPU utilization value).
Goals
The goal is just switch HPA to v2 that will allow:
- maintain backward compatibility, i.e. current PAAS spec for replicas will work exactly as before in sense of scaling logic
- add possibility to scale on various resource metrics
- allow extending FIAAS ecosystem with scaling based on business metrics
Non-Goals
- add support for custom metrics
- update PAAS spec
Proposal
User stories
Sidecar problem
As a user when I have a pod with sidecar(s) (for example datadog agent) and specify CPU threshold 75% for scaling - if datadog agent CPU usage is 0 while application pod CPU usage is 90 - average will be below 75 and scaling up won't happen.
Business metrics
As a user I want to scale my deployment based on relevant business metric (e.g. requests rate). With HPA v1 I just can't specify this metric.
⚠️ Caveats ⚠️
HPA API v2 was introduced in k8s version 1.9+, so for older cluster versions backward compatibility in case of just switch to API v2 will be broken.
Mitigations
There are several possible options of how to mitigate the caveat listed above:
- We can just state in release notes that new version includes breaking changes and older k8s versions is no longer supported.
- We can "branch" the development and make tagged releases
- We can not just switch to API v2, but maintain backward compatibility by adding some cluster version detection feature that will use API v1 for older clusters and v2 for newer ones.
Design details
💡 Ideally we should probably switch to using official k8s client library instead of fiaas onne, but it will require too much refactoring that is out of the scope of this proposal.
- We need to add v2 models in fiaas/k8s of autoscaling v2 objects
- We need to update FDD autoscaler deployer to use new models
- We need to support current
paas.yaml
spec so we need to add transformation of current metrics specification to v2 format, i.e.targetCPUUtilizationPercentage: XX
should be transformed to v2 metrics spec:
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: XX
Dependencies
- https://github.com/fiaas/k8s
Implementation history
- 2020-10-13: Proposal open for discussion
We are struggling with the exact same thing. Too bad I didn't see this issue before now. Been trying to use the extension-hook-url
to change the API version before it's deployed but it skips doing that for obvious reasons when I looked at the source code.