cf-for-k8s icon indicating copy to clipboard operation
cf-for-k8s copied to clipboard

Present an appropriate configuration API for scaling cf-for-k8s

Open Syerram opened this issue 4 years ago • 4 comments

Summary

This feature will enable Platform engineers to scale cf-for-k8s system components to match to their intended scale tiers.

Use cases

  1. Deploy to a laptop Platform engineers want to deploy cf-for-k8s to their laptop with minimum resources so that they can demo or kick the tires on cf-for-k8s.
  2. Deploy a small foundation with 10-50 prod apps
  3. Deploy a large foundation with 1000+ prod apps

We expect that Platform engineers will start with usecases 1 and 2 eventually progress towards usecase 3. As they progress , they want to control the scaling of the system components and other pertinent resources so that they can meet their intended scale but also want to optimize the cluster and infra resources usage. For e.g. The current cf-for-k8s footprint is much bigger for usecase 1 and maybe 2.

Also, see #60 for another evidence for exposing "scaling" properties.

What alternatives were considered

None, unless users know how to write overlays or directly update via kubectl [1]. Even then, it is unclear what parts of the system need to be configured to achieve the desired scale. For a new user, this is significant friction to scale cf-for-k8s.

[1] Using kubectl to scale deployments is out of band and kapp may reset that on the next upgrade (unless we request kapp to not reset the target replicas)

How will it work (AC)

The following AC applies to any component that exposes its horizontal scale and vertical scale properties.

Given I have an existing foundation installed in a cluster When I update the replica count of CAPI api-server from X to X + 2 in my data values   And I deploy via kapp Then I notice K8s schedule 2 additional CAPI api-server pods   And I am able to cf-push an app

Given I have an existing foundation installed in a cluster   And with X + 2 CAPI api-server pods running When I update the replica count of CAPI api-server from X+2 to X in my data values   And I deploy via kapp Then I notice K8s purges 2 CAPI api-server pods   And I am still able to cf-push an app

Out of scope

This feature request does not include recommendations on ideal configuration settings for different scale tiers. Long term, we will need to provide guidance on ideal configuration for different scaling and throughput requirements. SAP team is working on a continuous load test cf-for-k8s, which can provide data-informed scale recommendations.

Syerram avatar Aug 05 '20 22:08 Syerram

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/174204198

The labels on this github issue will be updated when the story is started.

cf-gitbot avatar Aug 05 '20 22:08 cf-gitbot

@tcdowney do we need to scale CAPI? I remember during our scaling experiments scaling CAPI didn't give any significant results

mike1808 avatar Sep 08 '20 17:09 mike1808

@mike1808 for the networking scaling experiments it didn't impact it much since we only cared about app workloads running and being able to serve traffic.

In production / production-like environments, though, operators and devs will often hit other non-push endpoints of the CF API for various reasons (observability, powering dashboards like Stratos/AppsManager, etc.). Additionally, some system components rely on the CF API to determine things like "is this user able to view logs for this app?"

If it's under too much load it can impact both these use cases and the availability of cf push.

We've documented scaling guidance for the CF API (and associated components) here:

  • https://docs.cloudfoundry.org/running/managing-cf/scaling-cloud-controller.html
  • https://docs.cloudfoundry.org/running/managing-cf/scaling-cloud-controller-k8s.html

tcdowney avatar Sep 08 '20 17:09 tcdowney

We've made significant progress on this (see https://github.com/cloudfoundry/cf-for-k8s/blob/main/docs/platform_operators/scaling.md) and will continue this work on our in-progress scaling epic

jamespollard8 avatar Dec 17 '20 22:12 jamespollard8