exo icon indicating copy to clipboard operation
exo copied to clipboard

[BOUNTY - $200] Batched Requests

Open AlexCheema opened this issue 1 year ago • 10 comments

Motivation: Batching multiple inference requests together can speed up inference. Batching can even be leveraged with single-input settings for speedups with e.g. staged speculative decoding.

What: Currently, exo handles inference requests separately. This bounty is for batching inferences together, so that multiple inputs can be passed through model shards together in a single pass.

Reward: $200 Bounty paid out with USDC on Ethereum, email [email protected]

AlexCheema avatar Jul 15 '24 01:07 AlexCheema

Working on this

aturker1 avatar Jul 18 '24 18:07 aturker1

@abdussamettrkr drop a comment here if you need any help / run into any issues. here to help

AlexCheema avatar Jul 18 '24 18:07 AlexCheema

Is this still available? Are there also other bounties? Are they open to anyone?

llvee avatar Sep 03 '24 13:09 llvee

@AlexCheema Please see my previous question.

llvee avatar Oct 14 '24 18:10 llvee

I'd like to share with you an EXO-like project I implemented that implements batch requests.

https://github.com/wnma3mz/tLLM

wnma3mz avatar Dec 20 '24 02:12 wnma3mz

@AlexCheema Is this still available?

Omar8345 avatar Apr 11 '25 04:04 Omar8345

Opened an initial PR adding batched sampling to reduce per-token overhead when multiple requests are active: https://github.com/exo-explore/exo/pull/891

This is a safe incremental step toward full forward-pass batching across shards. Happy to iterate on batched forward (combining per-request caches) next.

Bennethxyz avatar Oct 22 '25 09:10 Bennethxyz

@AlexCheema I want to actually work on this issue but please confirm if it is still available

Omar8345 avatar Oct 25 '25 03:10 Omar8345

@AlexCheema I’m interested in working on this bounty. Is it still active, and may I begin?

rishi-jat avatar Nov 13 '25 17:11 rishi-jat

??

rishi-jat avatar Nov 28 '25 08:11 rishi-jat