[BOUNTY - $200] Batched Requests
Motivation: Batching multiple inference requests together can speed up inference. Batching can even be leveraged with single-input settings for speedups with e.g. staged speculative decoding.
What: Currently, exo handles inference requests separately. This bounty is for batching inferences together, so that multiple inputs can be passed through model shards together in a single pass.
Reward: $200 Bounty paid out with USDC on Ethereum, email [email protected]
Working on this
@abdussamettrkr drop a comment here if you need any help / run into any issues. here to help
Is this still available? Are there also other bounties? Are they open to anyone?
@AlexCheema Please see my previous question.
I'd like to share with you an EXO-like project I implemented that implements batch requests.
https://github.com/wnma3mz/tLLM
@AlexCheema Is this still available?
Opened an initial PR adding batched sampling to reduce per-token overhead when multiple requests are active: https://github.com/exo-explore/exo/pull/891
This is a safe incremental step toward full forward-pass batching across shards. Happy to iterate on batched forward (combining per-request caches) next.
@AlexCheema I want to actually work on this issue but please confirm if it is still available
@AlexCheema Iām interested in working on this bounty. Is it still active, and may I begin?
??