llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

how many rpc-host should I start on remote server

Open vino5211 opened this issue 1 week ago • 3 comments

Discussed in https://github.com/ggerganov/llama.cpp/discussions/11858

Originally posted by vino5211 February 14, 2025 I have 4 gpu servers A,B,C,D, each has 4 NVIDIA A800 80GB PCIe. I start rpcserver on B,C,D. Here is the output of rpc-server commond, it seems found 4 CUDA devices, but only Device 0 is used on each server. So the question is how many rpc servers should I start on remote server, if there is 4 cuda devices? !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING: Host ('0.0.0.0') is != '127.0.0.1' Never expose the RPC server to an open network! This is an experimental feature and is not secure! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

create_backend: using CUDA backend ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 4 CUDA devices: Device 0: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes Device 1: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes Device 2: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes Device 3: NVIDIA A800 80GB PCIe, compute capability 8.0, VMM: yes Starting RPC server on 0.0.0.0:50052, backend memory: 80614 MB

vino5211 avatar Feb 14 '25 06:02 vino5211