llama.cpp
llama.cpp copied to clipboard
Eval bug: Segmentation fault on vanilla Ubuntu, only with version post 4460, solution: usr/local/lib/libllama.so needs replacing by hand
Name and Version
Used to work until at least:
llama-cli --version
version: 4460 (ba8a1f9c)
built with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu
Fails with the freshly compiled:
$ build/bin/llama-cli --version
version: 4564 (acd38efe)
built with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu
No build errors whatsoever:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Including CPU backend
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native
-- Configuring done (0.4s)
-- Generating done (6.6s)
-- Build files have been written to: .../Downloads/llama.cpp/build
...
Operating systems
Linux
GGML backends
CPU
Hardware
USB stick based OS, on e.g.
Name: Intel Core i5-10400
Microarchitecture: Comet Lake
Technology: 14nm
Max Frequency: 4.300 GHz
Cores: 6 cores (12 threads)
AVX: AVX,AVX2
FMA: FMA3
L1i Size: 32KB (192KB Total)
L1d Size: 32KB (192KB Total)
L2 Size: 256KB (1.5MB Total)
L3 Size: 12MB
Models
Fails asap:
build/bin/llama-cli --model "/media/.../DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf" --cache-type-k q8_0 --threads 16 --prompt '<|User|>What is Tao?<|Assistant|>' -c 8192
build: 4564 (acd38efe) with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
Segmentation fault
Works with the older version, compiled with the same tools, on the same OS etc.:
llama-cli --model "/media/.../Moje dokumenty/Mata/LLMs/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf" --cache-type-k q8_0 --threads 16 --prompt '<|User|>What is Tao?<|Assistant|>'
build: 4460 (ba8a1f9c) with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 32 key-value pairs and 292 tensors from /.../DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Llama 8B
llama_model_loader: - kv 3: general.organization str = Deepseek Ai
llama_model_loader: - kv 4: general.basename str = DeepSeek-R1-Distill-Llama
llama_model_loader: - kv 5: general.size_label str = 8B
llama_model_loader: - kv 6: llama.block_count u32 = 32
llama_model_loader: - kv 7: llama.context_length u32 = 131072
llama_model_loader: - kv 8: llama.embedding_length u32 = 4096
llama_model_loader: - kv 9: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 10: llama.attention.head_count u32 = 32
llama_model_loader: - kv 11: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 12: llama.rope.freq_base f32 = 500000,000000
llama_model_loader: - kv 13: llama.attention.layer_norm_rms_epsilon f32 = 0,000010
llama_model_loader: - kv 14: llama.attention.key_length u32 = 128
llama_model_loader: - kv 15: llama.attention.value_length u32 = 128
llama_model_loader: - kv 16: llama.vocab_size u32 = 128256
llama_model_loader: - kv 17: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 18: tokenizer.ggml.model str = gpt2
Problem description & steps to reproduce
Just build and compile, the usual way, which used to work: https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md, for months or maybe years, on many platforms. The compiler is unchanged:
clang --version
Ubuntu clang version 14.0.0-1ubuntu1.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
etc. - the vanilla setup for:
-------------------
OS: Zorin OS 17.2 x86_64
Host: PCS Aer H510B -CF
Kernel: 6.8.0-51-generic
Uptime: 1 hour, 35 mins
Packages: 3092 (dpkg), 18 (flatpak), 20 (snap)
Shell: bash 5.1.16
...
CPU: Intel i5-10400 (12) @ 4.300GHz
GPU: Intel CometLake-S GT2 [UHD Graphics 630]
Memory: 4839MiB / 15820MiB
with lots of swap.
First Bad Commit
after version: 4460 (ba8a1f9c), not sure which one exactly.
Relevant log output
Just:
main: load the model and apply lora adapter, if any
Segmentation fault
I may try to recompile it with some debug and `gdb ./build/llama-cli`, but FYI for now that it happens.
If possible, please do a git bisect and nail down the exact commit that introduced the problem.
Update: nothing interesting shown here, either:
(gdb) run
Starting program: ... /Downloads/llama.cpp/build/bin/llama-cli --model "/media/.../DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf" --cache-type-k q8_0 --threads 16 --prompt '<|User|>What is Tao?<|Assistant|>' -c 8192
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7400640 (LWP 57461)]
build: 4564 (acd38efe) with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu (debug)
main: llama backend init
main: load the model and apply lora adapter, if any
Thread 1 "llama-cli" received signal SIGSEGV, Segmentation fault.
0x00007ffff7f0ae70 in llama_model_loader::llama_model_loader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, llama_model_kv_override const*) () from /usr/local/lib/libllama.so
(gdb) backtrace
#0 0x00007ffff7f0ae70 in llama_model_loader::llama_model_loader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, llama_model_kv_override const*) () from /usr/local/lib/libllama.so
#1 0x00007ffff7e93f48 in llama_model_load_from_file () from /usr/local/lib/libllama.so
#2 0x000055555565140e in common_init_from_params (params=...) at ... /Downloads/llama.cpp/common/common.cpp:911
#3 0x00005555555b54b9 in main (argc=11, argv=0x7fffffffd6e8) at ... /Downloads/llama.cpp/examples/main/main.cpp:150
(gdb)
Oh, this worked: sudo mv /usr/local/lib/libllama.so /usr/local/lib/libllama.so.bak , now as it used to be:
/llama.cpp/build/bin$ ./llama-cli --model "/media/zezen/OS/Users/oin/Moje dokumenty/Mata/LLMs/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf" --cache-type-k q8_0 --threads 16 --prompt '<|User|>What is Tao?<|Assistant|>' -c 8192
build: 4564 (acd38efe) with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu (debug)
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 32 key-value pairs and 292 tensors from /media/zezen/OS/Users/oin/Moje dokumenty/Mata/LLMs/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Llama 8B
llama_model_loader: - kv 3: general.organization str = Deepseek Ai
llama_model_loader: - kv 4: general.basename str = DeepSeek-R1-Distill-Llama
....
So one just needs to overwrite the old .so files ...
This issue was closed because it has been inactive for 14 days since being marked as stale.