exo Example setups with benchmarks

Users want to know exactly which setups work, how to set them up, and what the benchmarks are.

A simple benchmark we can do is Mac Minis. We have 4 of them, so we can just progressively add Mac minis and measure the tok/sec.

Ref:

Jul 16 '24 16:07 AlexCheema

Request: Raspberry Pi. We could bundle it with Coral USB TPU (https://coral.ai/products/) which could be super cost effective home ai inference.

Jul 16 '24 17:07 AlexCheema

List of supported hardware:

Jul 16 '24 20:07 AlexCheema

2x4090 on exo vs bunch of 4090s in one pc

Jul 17 '24 17:07 AlexCheema

We could bundle it with Coral USB TPU

Coral is a little bit hard to get nowadays

There is also this partnership https://www.raspberrypi.com/news/raspberry-pi-ai-kit-available-now-at-70/

Jul 17 '24 20:07 audkar

The people want to know how fast it is.

Jul 18 '24 18:07 AlexCheema

Screenshot 2024-07-19 at 02 22 29

Jul 19 '24 09:07 AlexCheema

Jul 19 '24 11:07 AlexCheema

Yes, ideally, I'd like to know:

Cluster details
Model used, including bit size
Settings used, like context window size, caching settings
Cold start or 2nd+ run
Prompt used
Time to first token
Tokens/sec after first token

Dec 09 '24 14:12 jhgoodwin

Related to this question of benchmarking. It is my basic understanding that your cluster cannot exceed the speed of the fastest single machine's ability to process a layer, correct?

Eg, if I can run the model in a single M4 mini base model, 2x M4 mini base model will increase total throughput of the cluster (eg, can have 2 simultaneous requests), but single requests will be the same speed as before, is that correct?

Dec 09 '24 14:12 jhgoodwin

In my experience, adding a node halves the total tokens/sec throughput :(

Dec 10 '24 20:12 NotReallyADeveloper

Wen benchmarks?

Dec 22 '24 21:12 slavakurilyak