cluster-api-provider-openstack icon indicating copy to clipboard operation
cluster-api-provider-openstack copied to clipboard

Document guidelines around CAPI + CAPO cluster scalability

Open cunningr opened this issue 3 years ago • 2 comments

/kind documentation

This is not a code issue specifically but we were wondering if there exists any data or guidelines around CAPO cluster scalability. Thinking more specifically around;

Number of workload clusters Number of active nodes (would this be provider specific?) Number of healthchecks other important params?

If ball park numbers don't exist today, do we know what parameters are important to track for CAPI cluster scalability and what behaviours we might expect to see as we start to stretch the capabilities?

And finally are there any guideline for scaling a CAPI cluster in terms of which resources can be scaled up/out?

Note that this issue is directly related to https://github.com/kubernetes-sigs/cluster-api/issues/7308 raised in the CAPI project however for our needs we are specifically interested in CAPI + CAPO.

cunningr avatar Sep 29 '22 11:09 cunningr

This is not a code issue specifically but we were wondering if there exists any data or guidelines around CAPO cluster scalability. Thinking more specifically around;

this seems to be a user/operator with developer question, as dev perspective this might be a tough question due to no opreation and no hardware resource .. not sure @seanschneeweiss whether you have any insight?

jichenjc avatar Sep 30 '22 01:09 jichenjc

I remember I watched this https://www.youtube.com/watch?v=KzYV-fJ_wH0 and very good sharing @seanschneeweiss :)

jichenjc avatar Sep 30 '22 06:09 jichenjc

We reached > 350 clusters with > 1990 machines in one of our OpenStack regions. This is a number where we have to start analyzing as our provisioning takes longer than usual with unknown waiting times in between two controllers. I'll provide some information as soon as we have narrowed down the problem we are currently facing. However, CAPO doesn't seem to be the bottleneck and I think that the OpenStack API might get a problem soon but not the controller itself. Currently we are using the following concurrency values

 - --openstackcluster-concurrency=10
 - --openstackmachine-concurrency=20

Memory limit/request is at 500Mi but actual usage at 140Mi. It can spike on pod restart. CPU request is at 500Mi but actual usage at 160Mi. Of course during updates (new machines) this can increase.

Sean Schneeweiss [email protected], Mercedes-Benz Tech Innovation GmbH, Provider Information

seanschneeweiss avatar Nov 12 '22 14:11 seanschneeweiss

We reached > 350 clusters with > 1990 machines in one of our OpenStack regions. This is a number where we have to start analyzing as our provisioning takes longer than usual with unknown waiting times in between two controllers. I'll provide some information as soon as we have narrowed down the problem we are currently facing. However, CAPO doesn't seem to be the bottleneck and I think that the OpenStack API might get a problem soon but not the controller itself. Currently we are using the following concurrency values

those are some really cool numbers. would love to totally chat about how openstack is slowing down and profiling that part!

mnaser avatar Nov 14 '22 22:11 mnaser

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 23 '23 08:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 25 '23 09:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Apr 24 '23 09:04 k8s-triage-robot

@k8s-triage-robot: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 24 '23 09:04 k8s-ci-robot

@k8s-triage-robot: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 24 '23 09:04 k8s-ci-robot

@k8s-triage-robot: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 24 '23 09:04 k8s-ci-robot

@k8s-triage-robot: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 24 '23 09:04 k8s-ci-robot

@k8s-triage-robot: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 24 '23 09:04 k8s-ci-robot

@k8s-triage-robot: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 24 '23 09:04 k8s-ci-robot

@k8s-triage-robot: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 24 '23 09:04 k8s-ci-robot

@k8s-triage-robot: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 24 '23 09:04 k8s-ci-robot

@mnaser

those are some really cool numbers. would love to totally chat about how openstack is slowing down and profiling that part!

It is not OpenStack that seems to be slowing down. It is probably related to slowness of the operators - not sure yet.

seanschneeweiss avatar Apr 24 '23 10:04 seanschneeweiss

/remove-lifecycle stale

don't know what's gonna be done in this issue, let's keep open and if no activity later on ,let's close this

jichenjc avatar Apr 25 '23 00:04 jichenjc

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar May 25 '23 01:05 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar May 25 '23 01:05 k8s-ci-robot