exo icon indicating copy to clipboard operation
exo copied to clipboard

Getting 2 tokens at 0.1 t/s on Llama 3.3 70B (5 node M1 cluster), then stops

Open bricolage opened this issue 1 year ago • 3 comments

I have a freshly installed exo on 5 x M1s on gb ethernet. Total of 88gb RAM. 39TFLOPS.

The nodes have all downloaded their model shards. After I submit a prompt, after 60 seconds I get 2 tokens (reported at 0.1 t/s) and then it stops.

So first, why is it failing after 2 tokens, and then why that slow? What kind of rates would you expect in this config? I do notice the cluster will shift from 5 nodes to 4 nodes every minute or so, then go back to 5. And what do the colored node icons mean? I see a red one moving around very slowly. Thanks!

bricolage avatar Dec 08 '24 13:12 bricolage

At work we also had a similar issue.

We tested using three Mac Studios. When only using two we would get tokens generated, but it stopped after like two sentences.

When using more than two Mac Studios, we noticed a similar issue like you did.

The used model was the Llama 3.1 405B and each computer had 192GB of ram.

Also a clean install. All models were correct and not corrupt.

(I should also note that we used the DEBUG utility, but no problems were reported.)

I can also open a new issue in case this is unrelated.

BenjaminE98 avatar Dec 09 '24 12:12 BenjaminE98

I assume you have a single switch between all the devices? which model did you test with? What happens when you try with llama 1b?

AFDudley avatar Dec 28 '24 22:12 AFDudley

Please I need a littile bit of help. I am trying to run exo on WSL as well. when i run just the exo command it shows me available commands. but when I try to run the command "exo run llama-3.2-3b" I get "error: unknown command "run" for "exo"

Did you mean this? runstatus" error. Please help.

Fahad16301139 avatar Feb 06 '25 03:02 Fahad16301139