serve icon indicating copy to clipboard operation
serve copied to clipboard

Request Prioritization by Header Value

Open mpoemsl opened this issue 1 year ago • 6 comments

🚀 The feature

Inference requests are stored in a prioritized data structure. The priority of a request can be set via a custom header value. The priority values are categorical (e.g. LOW, HIGH). Workers retrieve jobs from the data structure according to configurable probabilities (e.g. a worker retrieves with 33% probability the next LOW priority job, with 66% probability the next HIGH priority job).

The feature is optional and backwards compatible in the sense that for requests for which the header is not set, the current FIFO queue behavior is retained.

Motivation, pitch

In a high-load scenario, users may want to prioritize certain job types over others (e.g. premium users' requests could have a high priority, while jobs that are not time sensitive could be deprioritized).

Alternatives

This could in theory be accomplished by serving multiple versions of the same model, but this would use more resources than serving a single model with request prioritization.

Additional context

I implemented this feature as part of my work at @textshuttle in a manner customized to their products' needs. If there is interest in this feature, I could create a more general PR for this feature. Note that users that do not need this feature can choose to simply not set the header value and retain the current behavior.

mpoemsl avatar Apr 29 '23 19:04 mpoemsl

This feature would also be accompanied by a metric to monitor the queue status by priority, possibly addressing #2101.

mpoemsl avatar Apr 29 '23 19:04 mpoemsl

@mpoemsl it makes sense to support priority jobQ. it would be great if you contribute your code.

lxning avatar May 01 '23 18:05 lxning

Thanks @lxning! I will make a PR for this feature soon.

mpoemsl avatar May 03 '23 20:05 mpoemsl

@mpoemsl Are you still planning to implement this feature soon?

AIexanderDicke avatar Nov 21 '23 12:11 AIexanderDicke

Hi @AIexanderDicke! I could implement the http part soon (disclaimer: soon means in the next few ~~weeks~~ months) if that would be useful. On another feature PR that works with header values, I'm currently blocked by my lack of familiarity with gRPC, which would probably also prevent me from fully implementing this feature. I don't think I'll have time to read up on gRPC soon.

mpoemsl avatar Nov 21 '23 13:11 mpoemsl

That would be great @mpoemsl!

AIexanderDicke avatar Nov 22 '23 05:11 AIexanderDicke