Alex Cheema comments

Results 409 comments of


                                            Alex Cheema

[BUG] Download Progress exceeding 100% through downloading a model for multiple devices.

> If ok, I would also like to fix the bug That would be welcomed. Thank you.

[MEDIUM] Better placement algorithm for pipeline parallelism using memory bandwidth + latency

Happy for you to take this one. Your approach looks good except it doesn’t take into account latency. A few tweaks to take into account latency should make it work.

Any Plans To Add Importing A Custom Model Not From Huggingface

do you mean a custom model? or dataset?

Add Tailscale Support

> Latency is often under 100ms (for me). TP (Tensor Parallelism) shouldn't be affected by high latency. Also latency tests can be done on the device to create optimal routes...

Context collapse when running Llama3-70B

First of all, thanks a lot for taking the time to run exo when it's still experimental. Most of all, thank you so much for making an issue - these...

Context collapse when running Llama3-70B

I pushed a quality of life improvement so you can use the ChatGPT api endpoint from any node https://github.com/exo-explore/exo/commit/8a35fd83f6e07b51b62e0dbe49028c9ef5f0455b

Context collapse when running Llama3-70B

Closed by mistake

Context collapse when running Llama3-70B

Can you try this again @matt-pulsipher? I can't reproduce anymore, and I fixed a few things recently.

Context collapse when running Llama3-70B

> > First of all, thanks a lot for taking the time to run exo when it's still experimental. Most of all, thank you so much for making an issue...

Context collapse when running Llama3-70B

Awesome, let me know if you need any help / want to run anything by me! Also, I love this idea - made an issue for it here: https://github.com/exo-explore/exo/issues/52