Alex Cheema
Alex Cheema
Users want to know exactly which setups work, how to set them up, and what the benchmarks are. A simple benchmark we can do is Mac Minis. We have 4...
**Motivation:** The goal of exo is to support any device in any setting. Radio is useful for settings with low connectivity e.g. ships. **What:** exo supports networking modules, which consist...
With the new shard download, we have Content-Range resumable downloads with integrity checks so we should be able to give a list of candidate download URLs (in list of priority)...
Add a setting to enable logs and other debug information at runtime.
- should already be supported - just check that it prioritises thunderbolt over WiFi - Thanks apple for making thunderbolt usable 
- tokens / sec  - memory usage - gpu utilisation - bytes sent / received - num errors - MFU (great metric. see e.g. https://x.com/__tinygrad__/status/1814519105346810038)
A promising idea from the community:
Perhaps something like https://github.com/tinygrad/tinygrad/blob/master/examples/llama3.py -- this doesn't prefill part of the prompt that's already been filled, it's super simple to implement.
This is our placement algorithm for pipeline parallelism: https://github.com/exo-explore/exo/blob/abaeb0323d4182f7bc4dd3775a8ba9209117d1cf/src/exo/master/placement_utils.py#L52-L100 It places a number of layers proportional to the memory available on each machine. This is not optimal. In order to...