Alex Cheema

Results 117 issues of Alex Cheema

Users want to know exactly which setups work, how to set them up, and what the benchmarks are. A simple benchmark we can do is Mac Minis. We have 4...

documentation

**Motivation:** The goal of exo is to support any device in any setting. Radio is useful for settings with low connectivity e.g. ships. **What:** exo supports networking modules, which consist...

good first issue

With the new shard download, we have Content-Range resumable downloads with integrity checks so we should be able to give a list of candidate download URLs (in list of priority)...

Add a setting to enable logs and other debug information at runtime.

- should already be supported - just check that it prioritises thunderbolt over WiFi - Thanks apple for making thunderbolt usable ![IMG_0094](https://github.com/user-attachments/assets/6b025b99-4bdd-4aa5-bf1c-d6ae2e7fd720)

enhancement
help wanted

- tokens / sec ![IMG_0077](https://github.com/user-attachments/assets/e3e9cdad-83a2-44c6-8486-db7666d84cdd) - memory usage - gpu utilisation - bytes sent / received - num errors - MFU (great metric. see e.g. https://x.com/__tinygrad__/status/1814519105346810038)

enhancement

Perhaps something like https://github.com/tinygrad/tinygrad/blob/master/examples/llama3.py -- this doesn't prefill part of the prompt that's already been filled, it's super simple to implement.

enhancement

This is our placement algorithm for pipeline parallelism: https://github.com/exo-explore/exo/blob/abaeb0323d4182f7bc4dd3775a8ba9209117d1cf/src/exo/master/placement_utils.py#L52-L100 It places a number of layers proportional to the memory available on each machine. This is not optimal. In order to...

enhancement
good first issue