Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6

Open Xxianna opened this issue 3 days ago • 0 comments

Name and Version

./llama-cli --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes version: 4743 (d07c6213) built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

e5 2686v4 + 2080ti22g

Models

deepseek r1 1776 q6 (unsloth)

Problem description & steps to reproduce

llamacppb4743cuda/llama-bench -p 128,512 -n 128,512
--model /mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf
--threads 36 --mmap 0
--numa distribute
-ngl 3

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes

model	size	params	backend	ngl	mmap	test	t/s
main: error: failed to load model '/mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf'

But Then

I loss 490GB ram 😭😭😭

First Bad Commit

No response

Relevant log output

llamacppb4743cuda/llama-bench -p 128,512 -n 128,512 \
  --model /mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf \
  --threads 36 --mmap 0 \
  --numa distribute \
  -ngl 3

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes
| model                          |       size |     params | backend    | ngl | mmap |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | ------------: | -------------------: |
main: error: failed to load model '/mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf'

Feb 20 '25 05:02 Xxianna

llama.cpp llama.cpp copied to clipboard

Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

But Then

First Bad Commit

Relevant log output

llama.cpp
llama.cpp copied to clipboard