Alex Cheema

Results 117 issues of Alex Cheema

### Introduction exo currently implements Pipeline Parallel inference. This splits up layers of a model over multiple devices and executes them sequentially, device-by-device. There are different ways we can split...

- Currently exo supports multiple requests to the same LLM concurrently (after: https://github.com/exo-explore/exo/pull/282) - However, if you try to request 2 different LLMs concurrently it fails

- Add the ability to specify a configuration file (in JSON or YAML or TOML format) - When specified, this should use a new `ManualDiscovery` discovery module - See existing...

### Background We want to make exo as accessible as possible. The most accessible thing possible would be you don't even have to install something, you just go to your...

### Background Right now exo requires installing python, installing packages with `pip` and then running the `exo` command. This is too difficult for people who are non-technical. A lot of...

- Currently tinychat download progress arbitrarily chooses the first element in the response: https://github.com/exo-explore/exo/blob/eade4fb62d3337df7f470b2edcb2294dde917b78/exo/tinychat/index.js#L314 - Instead we should show the progress of all nodes - This might also change the...

See #305 Currently the fix is to shield `process_prompt` however this also makes it so that if a chatgpt api request times out, it never gets cancelled even when a...