Alex Cheema
Alex Cheema
- Right now device capabilities are statically defined - It makes more sense for this to be dynamic since resources can change / utilisation can change.
Will require some core changes to how distributed inference works, hence higher bounty of $500. This would be a great contribution to exo.
- It's only used for tokenizers (processor for llava VLM) - The tokenizers code is fuzzy and bloated as a result of this hard to understand AutoTokenizer - Should be...
https://github.com/exo-explore/exo/issues/23#issuecomment-2241521048 Perhaps after each inference, we synchronise the full kv cache between all nodes. This should be fairly straightforward, we can broadcast the entire cache. this would allow for saving...
**Prerequisite:** https://github.com/exo-explore/exo/issues/1 **Motivation:** exo should use device resources as efficiently as possible. Current implementation underutilises available resources. **What:** See https://pytorch.org/docs/stable/pipeline.html **Reward:** $500 Bounty paid out with USDC on Ethereum, email...
Right now FLOPs are displayed using a lookup. Often users are confused when it shows 0 FLOPs, so we should show an estimate of device FLOPs even if it's not...
comfyui is pretty awesome https://github.com/comfyanonymous/ComfyUI We've had a request to integrate this. Would be really cool to build and run pipelines across multiple devices.
Right now, `max_generate_tokens` option limits the total number of tokens a given request can return. The desired behaviour is that it should limit the number of tokens on a given...
