llama.cpp
llama.cpp copied to clipboard
Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6
Name and Version
./llama-cli --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes version: 4743 (d07c6213) built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
e5 2686v4 + 2080ti22g
Models
Problem description & steps to reproduce
llamacppb4743cuda/llama-bench -p 128,512 -n 128,512
--model /mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf
--threads 36 --mmap 0
--numa distribute
-ngl 3
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes
model | size | params | backend | ngl | mmap | test | t/s |
---|---|---|---|---|---|---|---|
main: error: failed to load model '/mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf' |
But Then
I loss 490GB ram 😭😭😭
First Bad Commit
No response
Relevant log output
llamacppb4743cuda/llama-bench -p 128,512 -n 128,512 \
--model /mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf \
--threads 36 --mmap 0 \
--numa distribute \
-ngl 3
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes
| model | size | params | backend | ngl | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | ------------: | -------------------: |
main: error: failed to load model '/mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf'