llama.cpp Eval bug: Segmentation fault on vanilla Ubuntu, only with version post 4460, solution: usr/local/lib/libllama.so needs replacing by hand

Name and Version

Used to work until at least:

llama-cli --version 
version: 4460 (ba8a1f9c)
built with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu

Fails with the freshly compiled:

$ build/bin/llama-cli --version 
version: 4564 (acd38efe)
built with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu

No build errors whatsoever:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Including CPU backend
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native 
-- Configuring done (0.4s)
-- Generating done (6.6s)
-- Build files have been written to: .../Downloads/llama.cpp/build
...

Operating systems

Linux

GGML backends

CPU

Hardware

USB stick based OS, on e.g.

                                                              
                                                              
                                                              
                                                              Name:                Intel Core i5-10400
                                                              Microarchitecture:   Comet Lake
                                                              Technology:          14nm
                                                              Max Frequency:       4.300 GHz
                                                              Cores:               6 cores (12 threads)
                                                              AVX:                 AVX,AVX2
                                                              FMA:                 FMA3
                                                              L1i Size:            32KB (192KB Total)
                                                              L1d Size:            32KB (192KB Total)
                                                              L2 Size:             256KB (1.5MB Total)
                                                              L3 Size:             12MB

Models

Fails asap:

  build/bin/llama-cli --model "/media/.../DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf"     --cache-type-k q8_0     --threads 16     --prompt '<｜User｜>What is Tao?<｜Assistant｜>'   -c 8192
build: 4564 (acd38efe) with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
Segmentation fault

Works with the older version, compiled with the same tools, on the same OS etc.:

llama-cli --model "/media/.../Moje dokumenty/Mata/LLMs/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf"     --cache-type-k q8_0     --threads 16     --prompt '<｜User｜>What is Tao?<｜Assistant｜>'   
build: 4460 (ba8a1f9c) with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 32 key-value pairs and 292 tensors from /.../DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Llama 8B
llama_model_loader: - kv   3:                       general.organization str              = Deepseek Ai
llama_model_loader: - kv   4:                           general.basename str              = DeepSeek-R1-Distill-Llama
llama_model_loader: - kv   5:                         general.size_label str              = 8B
llama_model_loader: - kv   6:                          llama.block_count u32              = 32
llama_model_loader: - kv   7:                       llama.context_length u32              = 131072
llama_model_loader: - kv   8:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   9:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv  10:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  11:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  12:                       llama.rope.freq_base f32              = 500000,000000
llama_model_loader: - kv  13:     llama.attention.layer_norm_rms_epsilon f32              = 0,000010
llama_model_loader: - kv  14:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  15:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  16:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  17:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  18:                       tokenizer.ggml.model str              = gpt2

Problem description & steps to reproduce

Just build and compile, the usual way, which used to work: https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md, for months or maybe years, on many platforms. The compiler is unchanged:

clang --version
Ubuntu clang version 14.0.0-1ubuntu1.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

etc. - the vanilla setup for:

------------------- 
OS: Zorin OS 17.2 x86_64 
Host: PCS Aer H510B -CF 
Kernel: 6.8.0-51-generic 
Uptime: 1 hour, 35 mins 
Packages: 3092 (dpkg), 18 (flatpak), 20 (snap) 
Shell: bash 5.1.16 
... 
CPU: Intel i5-10400 (12) @ 4.300GHz 
GPU: Intel CometLake-S GT2 [UHD Graphics 630] 
Memory: 4839MiB / 15820MiB

with lots of swap.

First Bad Commit

after version: 4460 (ba8a1f9c), not sure which one exactly.

Relevant log output

Just: 

main: load the model and apply lora adapter, if any
Segmentation fault


I may try to recompile it with some debug and `gdb ./build/llama-cli`, but FYI for now that it happens.

Jan 27 '25 11:01 Manamama

If possible, please do a git bisect and nail down the exact commit that introduced the problem.

Jan 27 '25 11:01 JohannesGaessler

Update: nothing interesting shown here, either:

(gdb) run
Starting program: ... /Downloads/llama.cpp/build/bin/llama-cli --model "/media/.../DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf" --cache-type-k q8_0 --threads 16 --prompt '<｜User｜>What is Tao?<｜Assistant｜>' -c 8192

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7400640 (LWP 57461)]
build: 4564 (acd38efe) with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu (debug)
main: llama backend init
main: load the model and apply lora adapter, if any

Thread 1 "llama-cli" received signal SIGSEGV, Segmentation fault.
0x00007ffff7f0ae70 in llama_model_loader::llama_model_loader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, llama_model_kv_override const*) () from /usr/local/lib/libllama.so
(gdb) backtrace
#0  0x00007ffff7f0ae70 in llama_model_loader::llama_model_loader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, llama_model_kv_override const*) () from /usr/local/lib/libllama.so
#1  0x00007ffff7e93f48 in llama_model_load_from_file () from /usr/local/lib/libllama.so
#2  0x000055555565140e in common_init_from_params (params=...) at ... /Downloads/llama.cpp/common/common.cpp:911
#3  0x00005555555b54b9 in main (argc=11, argv=0x7fffffffd6e8) at ... /Downloads/llama.cpp/examples/main/main.cpp:150
(gdb)

Jan 27 '25 12:01 Manamama

Oh, this worked: sudo mv /usr/local/lib/libllama.so /usr/local/lib/libllama.so.bak , now as it used to be:

/llama.cpp/build/bin$ ./llama-cli --model "/media/zezen/OS/Users/oin/Moje dokumenty/Mata/LLMs/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf"     --cache-type-k q8_0     --threads 16     --prompt '<｜User｜>What is Tao?<｜Assistant｜>'   -c 8192
build: 4564 (acd38efe) with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu (debug)
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 32 key-value pairs and 292 tensors from /media/zezen/OS/Users/oin/Moje dokumenty/Mata/LLMs/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Llama 8B
llama_model_loader: - kv   3:                       general.organization str              = Deepseek Ai
llama_model_loader: - kv   4:                           general.basename str              = DeepSeek-R1-Distill-Llama
 ....

So one just needs to overwrite the old .so files ...

Jan 27 '25 12:01 Manamama

This issue was closed because it has been inactive for 14 days since being marked as stale.

Mar 13 '25 01:03 github-actions[bot]

llama.cpp llama.cpp copied to clipboard

Eval bug: Segmentation fault on vanilla Ubuntu, only with version post 4460, solution: usr/local/lib/libllama.so needs replacing by hand

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

llama.cpp
llama.cpp copied to clipboard