ml-commons
ml-commons copied to clipboard
[FEATURE] Add rate limiter to limit model usage
Most model service provider provide throttling. For now, we don't have rate limiting in ml-commons. That may trigger throttling if user don't control the rate.
Rate limit on what level: user level, model level , any other ?
Agent level.
Hi @ylwu-amzn , can you please update this in the scorecard project to be in the 2.12 train? Also, the status from 2.11 is 'no doc needed'. Is that still the case? Thanks so much.
I think we need doc for this feature. Created one doc issue https://github.com/opensearch-project/documentation-website/issues/5839
Control model enabled or not, if not enabled, user can't run predict for this model
PUT /_plugins/_ml/models/<MODEL_ID>
{"is_enabled": false}
Set the throttling parameters to make it allow one request for every 5 second
PUT /_plugins/_ml/models/<MODEL_ID>
{
"rate_limiter": {
"limit": "2",
"unit": "MINUTES"
}
}
Create some user level throttler for a specific model
POST _plugins/_ml/controllers/<MODEL_ID>
{
"user_rate_limiter": {
"user1": {
"limit": 3,
"unit": "MINUTES"
},
"user2": {
"limit": 4,
"unit": "MINUTES"
}
}
}
Question from @austintlee on community meeting
- Will it auto scale for adding new node to cluster?
- role based throttling?
To add on: how does scale down cluster also work ?
We are not going to support auto scale and role based throttling. For scale case, use need to call update API to reapply the throttling to cluster.