cf-for-k8s
cf-for-k8s copied to clipboard
Present an appropriate configuration API for scaling cf-for-k8s
Summary
This feature will enable Platform engineers to scale cf-for-k8s system components to match to their intended scale tiers.
Use cases
- Deploy to a laptop Platform engineers want to deploy cf-for-k8s to their laptop with minimum resources so that they can demo or kick the tires on cf-for-k8s.
- Deploy a small foundation with 10-50 prod apps
- Deploy a large foundation with 1000+ prod apps
We expect that Platform engineers will start with usecases 1 and 2 eventually progress towards usecase 3. As they progress , they want to control the scaling of the system components and other pertinent resources so that they can meet their intended scale but also want to optimize the cluster and infra resources usage. For e.g. The current cf-for-k8s footprint is much bigger for usecase 1 and maybe 2.
Also, see #60 for another evidence for exposing "scaling" properties.
What alternatives were considered
None, unless users know how to write overlays or directly update via kubectl [1]. Even then, it is unclear what parts of the system need to be configured to achieve the desired scale. For a new user, this is significant friction to scale cf-for-k8s.
[1] Using kubectl
to scale deployments is out of band and kapp may reset that on the next upgrade (unless we request kapp to not reset the target replicas)
How will it work (AC)
The following AC applies to any component that exposes its horizontal scale and vertical scale properties.
Given I have an existing foundation installed in a cluster
When I update the replica count of CAPI api-server
from X to X + 2 in my data values
And I deploy via kapp
Then I notice K8s schedule 2 additional CAPI api-server
pods
And I am able to cf-push an app
Given I have an existing foundation installed in a cluster
And with X + 2 CAPI api-server
pods running
When I update the replica count of CAPI api-server
from X+2 to X in my data values
And I deploy via kapp
Then I notice K8s purges 2 CAPI api-server
pods
And I am still able to cf-push an app
Out of scope
This feature request does not include recommendations on ideal configuration settings for different scale tiers. Long term, we will need to provide guidance on ideal configuration for different scaling and throughput requirements. SAP team is working on a continuous load test cf-for-k8s, which can provide data-informed scale recommendations.
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/174204198
The labels on this github issue will be updated when the story is started.
@tcdowney do we need to scale CAPI? I remember during our scaling experiments scaling CAPI didn't give any significant results
@mike1808 for the networking scaling experiments it didn't impact it much since we only cared about app workloads running and being able to serve traffic.
In production / production-like environments, though, operators and devs will often hit other non-push endpoints of the CF API for various reasons (observability, powering dashboards like Stratos/AppsManager, etc.). Additionally, some system components rely on the CF API to determine things like "is this user able to view logs for this app?"
If it's under too much load it can impact both these use cases and the availability of cf push
.
We've documented scaling guidance for the CF API (and associated components) here:
- https://docs.cloudfoundry.org/running/managing-cf/scaling-cloud-controller.html
- https://docs.cloudfoundry.org/running/managing-cf/scaling-cloud-controller-k8s.html
We've made significant progress on this (see https://github.com/cloudfoundry/cf-for-k8s/blob/main/docs/platform_operators/scaling.md) and will continue this work on our in-progress scaling epic