How to block to auto-scaling in specific situation
Ask your question here:
We use knative and kserve project in our product to provide inference services with auto-scaling. In some cases, we need to pause to the auto-scaling according to the multi-tenant resource handling. For example, We have two tenants allocated same amount resources as following:
- 1st tenant : CPU 10 core, Memory 10Gib
- 2nd tenant : CPU 10 core, Memory 10Gib
If we should deployed lots of objects, such as InferenceService or Service, 2nd tenant have exhausted your resources. In this situation, we have to block the auto-scaling in order to prevent the 2nd tenant from consuming the resources of the 1st tenant.
Could you tell me any idea or how to approach to solve it ?
Hi @jinholee-makinarocks. I suspect you could map your tenant concept to one or more namespaces and apply quotas per ns to reflect the total amount of resources per tenant. Knative services are running in a ns and scaling happens within a ns, so you could restrict the resources. Are you looking for something else?
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.