ml-commons [FEATURE] Add rate limiter to limit model usage

[FEATURE] Add rate limiter to limit model usage

Open ylwu-amzn opened this issue 2 years ago • 8 comments

Most model service provider provide throttling. For now, we don't have rate limiting in ml-commons. That may trigger throttling if user don't control the rate.

Jul 20 '23 18:07 ylwu-amzn

Rate limit on what level: user level, model level , any other ?

Sep 01 '23 16:09 ylwu-amzn

Agent level.

Dec 01 '23 17:12 austintlee

Hi @ylwu-amzn , can you please update this in the scorecard project to be in the 2.12 train? Also, the status from 2.11 is 'no doc needed'. Is that still the case? Thanks so much.

Dec 12 '23 00:12 hdhalter

I think we need doc for this feature. Created one doc issue https://github.com/opensearch-project/documentation-website/issues/5839

Dec 12 '23 05:12 ylwu-amzn

Control model enabled or not, if not enabled, user can't run predict for this model

PUT /_plugins/_ml/models/<MODEL_ID>
{"is_enabled": false}

Set the throttling parameters to make it allow one request for every 5 second

PUT /_plugins/_ml/models/<MODEL_ID>
{
    "rate_limiter": {
        "limit": "2",
        "unit": "MINUTES"
    }
}

Create some user level throttler for a specific model

POST _plugins/_ml/controllers/<MODEL_ID>
{
  "user_rate_limiter": {
    "user1": {
      "limit": 3,
      "unit": "MINUTES"
    },
    "user2": {
      "limit": 4,
      "unit": "MINUTES"
    }
  }
}

Jan 19 '24 21:01 ylwu-amzn

Question from @austintlee on community meeting

Will it auto scale for adding new node to cluster?
role based throttling?

Jan 19 '24 22:01 ylwu-amzn

To add on: how does scale down cluster also work ?

Jan 19 '24 23:01 saratvemulapalli

We are not going to support auto scale and role based throttling. For scale case, use need to call update API to reapply the throttling to cluster.

Feb 03 '24 01:02 ylwu-amzn

ml-commons ml-commons copied to clipboard

[FEATURE] Add rate limiter to limit model usage

ml-commons
ml-commons copied to clipboard