mosec icon indicating copy to clipboard operation
mosec copied to clipboard

feat: support iteration level scheduling

Open kemingy opened this issue 1 year ago • 1 comments

          Also https://www.usenix.org/conference/osdi22/presentation/yu

Originally posted by @VoVAllen in https://github.com/mosecorg/mosec/issues/382#issuecomment-1588622255

kemingy avatar Jun 13 '23 06:06 kemingy

Although Orca coupled the scheduler and execution engine, it still has something we can learn from.

For GPT-like models, they can benefit from iteration-level scheduling in the following part:

  1. <EOS> status request can return to the client before other requests are finished in this batch
  2. new requests can enter the batch without waiting for all the requests in the previous batch to have been finished

refer to:

  • Orca: https://www.usenix.org/conference/osdi22/presentation/yu
  • BatchMaker: https://cs.nyu.edu/~lingfan/resources/batchmaker-eurosys18.pptx
  • text-generation-inference continuous batching: https://github.com/huggingface/text-generation-inference/blob/main/router/README.md#continuous-batching

kemingy avatar Jun 13 '23 06:06 kemingy