serving icon indicating copy to clipboard operation
serving copied to clipboard

How to block to auto-scaling in specific situation

Open jinholee-makinarocks opened this issue 1 year ago • 2 comments

Ask your question here:

We use knative and kserve project in our product to provide inference services with auto-scaling. In some cases, we need to pause to the auto-scaling according to the multi-tenant resource handling. For example, We have two tenants allocated same amount resources as following:

  • 1st tenant : CPU 10 core, Memory 10Gib
  • 2nd tenant : CPU 10 core, Memory 10Gib

If we should deployed lots of objects, such as InferenceService or Service, 2nd tenant have exhausted your resources. In this situation, we have to block the auto-scaling in order to prevent the 2nd tenant from consuming the resources of the 1st tenant.

Could you tell me any idea or how to approach to solve it ?

jinholee-makinarocks avatar Jul 05 '24 00:07 jinholee-makinarocks

Hi @jinholee-makinarocks. I suspect you could map your tenant concept to one or more namespaces and apply quotas per ns to reflect the total amount of resources per tenant. Knative services are running in a ns and scaling happens within a ns, so you could restrict the resources. Are you looking for something else?

skonto avatar Jul 11 '24 10:07 skonto

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Oct 10 '24 01:10 github-actions[bot]