Bug: rpc-server --mem Doesn't Match backend memory
What happened?
$ CUDA_VISIBLE_DEVICES=0 build/bin/Release/rpc-server -p 50052 --mem 10000
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes
Starting RPC server on 0.0.0.0:50052, backend memory: 1808 MB
$ CUDA_VISIBLE_DEVICES=0 build/bin/Release/rpc-server -p 50052 --mem 20000
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes
Starting RPC server on 0.0.0.0:50052, backend memory: 3616 MB
$ CUDA_VISIBLE_DEVICES=0 build/bin/Release/rpc-server -p 50052 --mem 30000
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes
Starting RPC server on 0.0.0.0:50052, backend memory: 1328 MB
I expected backend memory: $mem MB when I input --mem $mem
Name and Version
$ ./build/bin/Release/llama-cli --version
version: 3368 (dd07a123)
built with MSVC 19.40.33812.0 for x64
What operating system are you seeing the problem on?
Windows
Relevant log output
$ CUDA_VISIBLE_DEVICES=0 build/bin/Release/rpc-server -p 50052 --mem 10000
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes
Starting RPC server on 0.0.0.0:50052, backend memory: 1808 MB
$ CUDA_VISIBLE_DEVICES=0 build/bin/Release/rpc-server -p 50052 --mem 20000
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes
Starting RPC server on 0.0.0.0:50052, backend memory: 3616 MB
$ CUDA_VISIBLE_DEVICES=0 build/bin/Release/rpc-server -p 50052 --mem 30000
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes
Starting RPC server on 0.0.0.0:50052, backend memory: 1328 MB
This should actually be high severity
Memory limits(rpc-server --mem) are not working!!
Memory limits(rpc-server --mem) are not working!!
? I know? That's what I'm saying?
There is a problem where all memory is used even if --mem is specified.
There is a problem where all memory is used even if --mem is specified.
Awesome. /s Thanks for telling me though
It loads only the number of layers set with --ngl, so it crashes due to a buffer overflow.
Ideally, it would be better to change the specification so that -ngl can be set individually on the RPC server side.
Ideally, it would be better to change the specification so that -ngl can be set individually on the RPC server side.
I think fixing --mem would be better. Remote servers should be as hands-off as possible and -ngl should ideally become a --mem -type option as well. Would make way more sense than -ngl
q.v.
I also found the way the RPC server and client deals with specifying / limiting what memory on the CPU / GPU resources to be confusing and limited and so I, too, would like to see simple / clear means of limiting what memory (RAM/VRAM) is used on each node. IMO it'd also be nicer if the model data could be locally loaded vs. uploaded over the network to the RPC servers, too.
#8112 Bug: [RPC] RPC apparently isn't honoring backend memory capacity et. al. #8112
#8113 Feature Request: Provide means to quantify the restriction of RAM/VRAM usage for each GPU and system RAM. #8113
#8114 Feature Request: It would be convenient and faster if users could specify that the model data used for a RPC-server instance is already available by some fast(er) means (file system GGUF, whatever).
This issue was closed because it has been inactive for 14 days since being marked as stale.
This still happens, it should be reopened.