mosec
mosec copied to clipboard
feat: support iteration level scheduling
Also https://www.usenix.org/conference/osdi22/presentation/yu
Originally posted by @VoVAllen in https://github.com/mosecorg/mosec/issues/382#issuecomment-1588622255
Although Orca coupled the scheduler and execution engine, it still has something we can learn from.
For GPT-like models, they can benefit from iteration-level scheduling in the following part:
- <EOS> status request can return to the client before other requests are finished in this batch
- new requests can enter the batch without waiting for all the requests in the previous batch to have been finished
refer to:
- Orca: https://www.usenix.org/conference/osdi22/presentation/yu
- BatchMaker: https://cs.nyu.edu/~lingfan/resources/batchmaker-eurosys18.pptx
- text-generation-inference continuous batching: https://github.com/huggingface/text-generation-inference/blob/main/router/README.md#continuous-batching