Knative scaling envelope
/area scaling /area monitoring /area networking
Main question: Are there any official or anecdotal accounts of how knative scales?
Background
We have been running knative serving in production now for around one year, and I've got to thinking how will it continue to scale as the number of knative services in our cluster grows. Currently we have ~1000 services, each gets aggressively garbage collected when new revisions are deployed. I've tried to search for any documentation around how knative scales under different conditions, but could not locate any. Are there any official or anecdotal accounts of how knative scales? I'd be particularly interested in anything known with regard to
- Number of services/revisions
- Amount of traffic distributed amongst those services
- Anything else other users would find helpful
From my initial observations, it seems as though most of the internal components perform well given the default resource allocations. It appears as though each new service adds to the memory footprint of the components (controller/activator/autoscaler/etc), and this appears linear. I expect at least for our workloads we could see it scale another 2-3x before we hit issues wrt memory. But then even those should be mitigated by throwing more resources at it.
I guess something like this could be useful.
@howespt: The label(s) area/scaling cannot be applied, because the repository doesn't have them.
In response to this:
/area scaling /area monitoring /area networking
Main question: Are there any official or anecdotal accounts of how knative scales?
Background
We have been running knative serving in production now for around one year, and I've got to thinking how will it continue to scale as the number of knative services in our cluster grows. Currently we have ~1000 services, each gets aggressively garbage collected when new revisions are deployed. I've tried to search for any documentation around how knative scales under different conditions, but could not locate any. Are there any official or anecdotal accounts of how knative scales? I'd be particularly interested in anything known with regard to
- Number of services/revisions
- Amount of traffic distributed amongst those services
- Anything else other users would find helpful
From my initial observations, it seems as though most of the internal components perform well given the default resource allocations. It appears as though each new service adds to the memory footprint of the components (controller/activator/autoscaler/etc), and this appears linear. I expect at least for our workloads we could see it scale another 2-3x before we hit issues wrt memory. But then even those should be mitigated by throwing more resources at it.
I guess something like this could be useful.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
https://knative.dev/docs/serving/autoscaling/kpa-specific/#scale-up-rate https://knative.dev/docs/serving/load-balancing/target-burst-capacity/ maybe this help?
@jwcesign I think the question was more around, do you have any metrics or how many unique services can knative handle and not how a single one can scale to multiple instances.
I never see someting like this. @pastjean
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.
This issue or pull request is stale because it has been open for 90 days with no activity.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close
/lifecycle stale
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.