Multiple nodes do not speed up inference on large models

Open wzLLM opened this issue 1 year ago • 2 comments

Hello, thank you very much for open-sourcing such an excellent project. I am currently encountering a problem. When I use two Mac computers to run a large model, the speed of inference is no different from that of using one Mac computer. I would like to ask how to solve this problem. Thank you very much.

Aug 11 '24 13:08 wzLLM

Currently exo does pipeline parallel inference which is faster compared to offloading when a single device can't fit the entire model. If a single device can fit the entire model, then just do that -- no need for exo.

We're working on other kinds of parallelism that will improve speed of inference as you add more devices.

Aug 11 '24 14:08 AlexCheema

Got it, thank you for your answer.

Aug 11 '24 15:08 wzLLM