llama.cpp mpi : attempt inference of 65B LLaMA on a cluster of Raspberry Pis

mpi : attempt inference of 65B LLaMA on a cluster of Raspberry Pis

Open ggerganov opened this issue 11 months ago • 53 comments

Now that distributed inference is supported thanks to the work of @evanmiller in #2099 it would be fun to try to utilize it for something cool. One such idea is to connect a bunch of Raspberry Pis in a local network and run the inference using MPI:

# sample cluster of 8 devices (replace with actual IP addresses of the devices)
$ cat ./hostfile
192.168.0.1:1
192.168.0.2:1
192.168.0.3:1
192.168.0.4:1
192.168.0.5:1
192.168.0.6:1
192.168.0.7:1
192.168.0.8:1

# build with MPI support
$ make CC=mpicc CXX=mpicxx LLAMA_MPI=1 -j

# run distributed inference over 8 nodes
$ mpirun -hostfile ./hostfile -n 8 ./main -m /mnt/models/65B/ggml-model-q4_0.bin -p "I believe the meaning of life is" -n 64

Here we assume that the 65B model data is located on a network share in /mnt and that mmap works over a network share. Not sure if that is the case - if not, then it would be more difficult to perform this experiment.

Looking for people with access to the necessary hardware to perform this experiment

Jul 10 '23 16:07 ggerganov

llama.cpp llama.cpp copied to clipboard

mpi : attempt inference of 65B LLaMA on a cluster of Raspberry Pis

llama.cpp
llama.cpp copied to clipboard