mosec feat: support iteration level scheduling

feat: support iteration level scheduling

Open kemingy opened this issue 1 year ago • 1 comments

          Also https://www.usenix.org/conference/osdi22/presentation/yu

Originally posted by @VoVAllen in https://github.com/mosecorg/mosec/issues/382#issuecomment-1588622255

Jun 13 '23 06:06 kemingy

Although Orca coupled the scheduler and execution engine, it still has something we can learn from.

For GPT-like models, they can benefit from iteration-level scheduling in the following part:

<EOS> status request can return to the client before other requests are finished in this batch
new requests can enter the batch without waiting for all the requests in the previous batch to have been finished

refer to:

Orca: https://www.usenix.org/conference/osdi22/presentation/yu
BatchMaker: https://cs.nyu.edu/~lingfan/resources/batchmaker-eurosys18.pptx
text-generation-inference continuous batching: https://github.com/huggingface/text-generation-inference/blob/main/router/README.md#continuous-batching

Jun 13 '23 06:06 kemingy