Alex Cheema

Results 117 issues of Alex Cheema

- Right now device capabilities are statically defined - It makes more sense for this to be dynamic since resources can change / utilisation can change.

Will require some core changes to how distributed inference works, hence higher bounty of $500. This would be a great contribution to exo.

- It's only used for tokenizers (processor for llava VLM) - The tokenizers code is fuzzy and bloated as a result of this hard to understand AutoTokenizer - Should be...

https://github.com/exo-explore/exo/issues/23#issuecomment-2241521048 Perhaps after each inference, we synchronise the full kv cache between all nodes. This should be fairly straightforward, we can broadcast the entire cache. this would allow for saving...

enhancement

**Prerequisite:** https://github.com/exo-explore/exo/issues/1 **Motivation:** exo should use device resources as efficiently as possible. Current implementation underutilises available resources. **What:** See https://pytorch.org/docs/stable/pipeline.html **Reward:** $500 Bounty paid out with USDC on Ethereum, email...

enhancement

Right now FLOPs are displayed using a lookup. Often users are confused when it shows 0 FLOPs, so we should show an estimate of device FLOPs even if it's not...

comfyui is pretty awesome https://github.com/comfyanonymous/ComfyUI We've had a request to integrate this. Would be really cool to build and run pipelines across multiple devices.

Right now, `max_generate_tokens` option limits the total number of tokens a given request can return. The desired behaviour is that it should limit the number of tokens on a given...

![IMG_0084](https://github.com/user-attachments/assets/b52fd7a3-1b0d-48b5-9f61-afc5931e1d18)

enhancement