serving icon indicating copy to clipboard operation
serving copied to clipboard

Knative scaling envelope

Open howespt opened this issue 3 years ago • 4 comments

/area scaling /area monitoring /area networking

Main question: Are there any official or anecdotal accounts of how knative scales?

Background

We have been running knative serving in production now for around one year, and I've got to thinking how will it continue to scale as the number of knative services in our cluster grows. Currently we have ~1000 services, each gets aggressively garbage collected when new revisions are deployed. I've tried to search for any documentation around how knative scales under different conditions, but could not locate any. Are there any official or anecdotal accounts of how knative scales? I'd be particularly interested in anything known with regard to

  • Number of services/revisions
  • Amount of traffic distributed amongst those services
  • Anything else other users would find helpful

From my initial observations, it seems as though most of the internal components perform well given the default resource allocations. It appears as though each new service adds to the memory footprint of the components (controller/activator/autoscaler/etc), and this appears linear. I expect at least for our workloads we could see it scale another 2-3x before we hit issues wrt memory. But then even those should be mitigated by throwing more resources at it.

I guess something like this could be useful.

howespt avatar Jun 10 '22 19:06 howespt

@howespt: The label(s) area/scaling cannot be applied, because the repository doesn't have them.

In response to this:

/area scaling /area monitoring /area networking

Main question: Are there any official or anecdotal accounts of how knative scales?

Background

We have been running knative serving in production now for around one year, and I've got to thinking how will it continue to scale as the number of knative services in our cluster grows. Currently we have ~1000 services, each gets aggressively garbage collected when new revisions are deployed. I've tried to search for any documentation around how knative scales under different conditions, but could not locate any. Are there any official or anecdotal accounts of how knative scales? I'd be particularly interested in anything known with regard to

  • Number of services/revisions
  • Amount of traffic distributed amongst those services
  • Anything else other users would find helpful

From my initial observations, it seems as though most of the internal components perform well given the default resource allocations. It appears as though each new service adds to the memory footprint of the components (controller/activator/autoscaler/etc), and this appears linear. I expect at least for our workloads we could see it scale another 2-3x before we hit issues wrt memory. But then even those should be mitigated by throwing more resources at it.

I guess something like this could be useful.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

knative-prow[bot] avatar Jun 10 '22 19:06 knative-prow[bot]

https://knative.dev/docs/serving/autoscaling/kpa-specific/#scale-up-rate https://knative.dev/docs/serving/load-balancing/target-burst-capacity/ maybe this help?

jwcesign avatar Jul 13 '22 14:07 jwcesign

@jwcesign I think the question was more around, do you have any metrics or how many unique services can knative handle and not how a single one can scale to multiple instances.

pastjean avatar Jul 13 '22 14:07 pastjean

I never see someting like this. @pastjean

jwcesign avatar Jul 15 '22 03:07 jwcesign

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Oct 14 '22 01:10 github-actions[bot]

This issue or pull request is stale because it has been open for 90 days with no activity.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close

/lifecycle stale

knative-prow-robot avatar Nov 13 '22 01:11 knative-prow-robot

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Jun 28 '23 01:06 github-actions[bot]