exo [BOUNTY - $200] Batched Requests

Motivation: Batching multiple inference requests together can speed up inference. Batching can even be leveraged with single-input settings for speedups with e.g. staged speculative decoding.

What: Currently, exo handles inference requests separately. This bounty is for batching inferences together, so that multiple inputs can be passed through model shards together in a single pass.

Reward: $200 Bounty paid out with USDC on Ethereum, email [email protected]

Jul 15 '24 01:07 AlexCheema

Working on this

Jul 18 '24 18:07 aturker1

@abdussamettrkr drop a comment here if you need any help / run into any issues. here to help

Jul 18 '24 18:07 AlexCheema

Is this still available? Are there also other bounties? Are they open to anyone?

Sep 03 '24 13:09 llvee

@AlexCheema Please see my previous question.

Oct 14 '24 18:10 llvee

I'd like to share with you an EXO-like project I implemented that implements batch requests.

https://github.com/wnma3mz/tLLM

Dec 20 '24 02:12 wnma3mz

@AlexCheema Is this still available?

Apr 11 '25 04:04 Omar8345

Opened an initial PR adding batched sampling to reduce per-token overhead when multiple requests are active: https://github.com/exo-explore/exo/pull/891

This is a safe incremental step toward full forward-pass batching across shards. Happy to iterate on batched forward (combining per-request caches) next.

Oct 22 '25 09:10 Bennethxyz

@AlexCheema I want to actually work on this issue but please confirm if it is still available

Oct 25 '25 03:10 Omar8345

@AlexCheema I’m interested in working on this bounty. Is it still active, and may I begin?

Nov 13 '25 17:11 rishi-jat

??

Nov 28 '25 08:11 rishi-jat