Multiple nodes do not speed up inference on large models
Hello, thank you very much for open-sourcing such an excellent project. I am currently encountering a problem. When I use two Mac computers to run a large model, the speed of inference is no different from that of using one Mac computer. I would like to ask how to solve this problem. Thank you very much.
Currently exo does pipeline parallel inference which is faster compared to offloading when a single device can't fit the entire model. If a single device can fit the entire model, then just do that -- no need for exo.
We're working on other kinds of parallelism that will improve speed of inference as you add more devices.
Got it, thank you for your answer.