llama.cpp
llama.cpp copied to clipboard
[User] Segfault when saving session cache since ecb217d
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the README.md.
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
No segfault when saving session cache since ecb217d
Current Behavior
Segfault when saving session cache since ecb217d
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
- Physical (or virtual) hardware you are using, e.g. for Linux:
$ system_profiler SPHardwareDataType
Hardware:
Hardware Overview:
Model Name: Mac Studio
Model Identifier: Mac13,1
Model Number: Z14J000LLX/A
Chip: Apple M1 Max
Total Number of Cores: 10 (8 performance and 2 efficiency)
Memory: 64 GB
System Firmware Version: 8422.100.650
OS Loader Version: 8422.100.650
- Operating System, e.g. for Linux:
$ sw_vers
ProductName: macOS
ProductVersion: 13.3.1
ProductVersionExtra: (a)
BuildVersion: 22E772610a
$ uname -a
Darwin workstation.local 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar 6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000 arm64
- SDK version, e.g. for Linux:
$ python3 --version
Python 3.11.3
$ make --version
GNU Make 3.81
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
This program built for i386-apple-darwin11.3.0
$ g++ --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Failure Information (for bugs)
Segfault when saving session cache since ecb217d
Steps to Reproduce
$ git checkout ecb217d$ make clean; make$ rm -f /tmp/prompt.cache; ./main -m ./models/7B/ggml-model-q4_0.bin -n 50 -s 0 -p "Top 10 cat memes:" --prompt-cache "/tmp/prompt.cache"
Failure Logs
$ rm -f /tmp/prompt.cache; ./main -m ./models/7B/ggml-model-q4_0.bin -n 50 -s 0 -p "Top 10 cat memes:" --prompt-cache "/tmp/prompt.cache"
main: build = 612 (ecb217d)
main: seed = 0
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.07 MB
llama_model_load_internal: mem required = 1932.71 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: attempting to load saved session from '/tmp/prompt.cache'
main: session file does not exist, will create
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 50, n_keep = 0
Top 10 cat memes:fish: Job 1, './main -m ./models/7B/ggml-mode…' terminated by signal SIGSEGV (Address boundary error)
Backtrace after building with LLAMA_DEBUG=1:
$ rm -f /tmp/prompt.cache; lldb -b -o 'run' -k 'bt' -- ./main -m ./models/7B/ggml-model-q4_0.bin -n 50 -s 0 -p "Top 10 cat memes:" --prompt-cache "/tmp/prompt.cache"
(lldb) target create "./main"
Current executable set to '/tmp/llama.cpp/main' (arm64).
(lldb) settings set -- target.run-args "-m" "./models/7B/ggml-model-q4_0.bin" "-n" "50" "-s" "0" "-p" "Top 10 cat memes:" "--prompt-cache" "/tmp/prompt.cache"
(lldb) run
main: build = 612 (ecb217d)
main: seed = 0
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.07 MB
llama_model_load_internal: mem required = 1932.71 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: attempting to load saved session from '/tmp/prompt.cache'
main: session file does not exist, will create
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 50, n_keep = 0
Top 10 cat memes:GGML_ASSERT: ggml.c:3986: ((uintptr_t) (ctx->mem_buffer))%GGML_MEM_ALIGN == 0
Process 22227 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x000000018d520724 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`:
-> 0x18d520724 <+8>: b.lo 0x18d520744 ; <+40>
0x18d520728 <+12>: pacibsp
0x18d52072c <+16>: stp x29, x30, [sp, #-0x10]!
0x18d520730 <+20>: mov x29, sp
Target 0: (main) stopped.
Process 22227 launched: '/tmp/llama.cpp/main' (arm64)
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x000000018d520724 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x000000018d557c28 libsystem_pthread.dylib`pthread_kill + 288
frame #2: 0x000000018d465ae8 libsystem_c.dylib`abort + 180
frame #3: 0x0000000100010f34 main`ggml_init(params=(mem_size = 4096, mem_buffer = 0x000000016fdebd98, no_alloc = true)) at ggml.c:3986:5
frame #4: 0x000000010004d174 main`::llama_copy_state_data(ctx=0x0000000101009c00, dst=" \U0000001a") at llama.cpp:2739:38
frame #5: 0x000000010004e8d4 main`::llama_save_session_file(ctx=0x0000000101009c00, path_session="/tmp/prompt.cache", tokens=0x0000600000d7f660, n_token_count=9) at llama.cpp:2956:41
frame #6: 0x0000000100003aac main`main(argc=11, argv=0x000000016fdfe6c0) at main.cpp:422:17
frame #7: 0x000000018d1fff28 dyld`start + 2236
Seems to be the ggml_view_3d calls.
https://github.com/ggerganov/llama.cpp/blob/5220a991a5e92bddad9542267ab445a2c033681c/llama.cpp#LL2759-L2765
And, more specifically, this memcpy call:
https://github.com/ggerganov/llama.cpp/blob/5220a991a5e92bddad9542267ab445a2c033681c/ggml.c#L5901
I am seeing this as well.
Doesn't work for me too.
Happens to me as well, I'm on windows using latest avx2 release.
Faulting application name: main.exe, version: 0.0.0.0, time stamp: 0x647ede4b
Faulting module name: main.exe, version: 0.0.0.0, time stamp: 0x647ede4b
Exception code: 0xc0000005
Fault offset: 0x000000000003fc81
Faulting process id: 0x73fc
Faulting application start time: 0x01d9987bac3a2a5d
Faulting application path: C:\Syahmi\Devs\llama.cpp\main.exe
Faulting module path: C:\Syahmi\Devs\llama.cpp\main.exe
Report Id: 8d6237af-917a-472e-b99e-37118f6ad500
Faulting package full name:
Faulting package-relative application ID:
I have checked the crash, same place as @colinc
Crash at ggml_view_3d calls due to offs->data is NULL
Happening to me in linux as well, I think, via libllama.so from llama-cpp-python.
...
lama_print_timings: load time = 9348.15 ms
llama_print_timings: sample time = 129.86 ms / 26 runs ( 4.99 ms per token)
llama_print_timings: prompt eval time = 9348.02 ms / 16 tokens ( 584.25 ms per token)
llama_print_timings: eval time = 16120.10 ms / 25 runs ( 644.80 ms per token)
llama_print_timings: total time = 25652.71 ms
Llama._create_completion: cache save
Thread 17 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe019c36c0 (LWP 13731)]
0x00007ffe30eca776 in ggml_view_3d (ctx=0x7ffe30feb2c8 <g_state+200>, a=0x7ffafa000020, ne0=5120,
ne1=41, ne2=40, nb1=10240, nb2=20480000, offset=0) at ggml.c:5972
5972 memcpy(offs->data, &offset, 2*sizeof(int32_t));
(gdb) backtrace
#0 0x00007ffe30eca776 in ggml_view_3d (ctx=0x7ffe30feb2c8 <g_state+200>, a=0x7ffafa000020, ne0=5120,
ne1=41, ne2=40, nb1=10240, nb2=20480000, offset=0) at ggml.c:5972
#1 0x00007ffe30e8d170 in llama_copy_state_data (ctx=0xc2a95f0, dst=0x7ff9c8350010 "$\032")
at llama.cpp:2951
#2 0x00007ffff5a8ce3e in ?? () from /usr/lib/libffi.so.8
#3 0x00007ffff5a890ef in ?? () from /usr/lib/libffi.so.8
#4 0x00007ffff5a8c2c3 in ffi_call () from /usr/lib/libffi.so.8
#5 0x00007ffff59ed99b in ?? ()
from /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so
#6 0x00007ffff59ec854 in ?? ()
from /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so
#7 0x00007ffff7b3cb94 in ?? () from /usr/lib/libpython3.11.so.1.0
#8 0x00007ffff7b4aef6 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.11.so.1.0
#9 0x00007ffff7ba2227 in ?? () from /usr/lib/libpython3.11.so.1.0
#10 0x00007ffff7b4acd8 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.11.so.1.0
#11 0x00007ffff7ba6f83 in ?? () from /usr/lib/libpython3.11.so.1.0
#12 0x00007ffff7ba6acb in ?? () from /usr/lib/libpython3.11.so.1.0
#13 0x00007ffff7b8b3aa in PyObject_Call () from /usr/lib/libpython3.11.so.1.0
#14 0x00007ffff7b4f721 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.11.so.1.0
#15 0x00007ffff7b7e960 in _PyFunction_Vectorcall () from /usr/lib/libpython3.11.so.1.0
#16 0x00007ffff7b4f721 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.11.so.1.0
#17 0x00007ffff7ba6f83 in ?? () from /usr/lib/libpython3.11.so.1.0
#18 0x00007ffff7ba6b08 in ?? () from /usr/lib/libpython3.11.so.1.0
#19 0x00007ffff7ca7a80 in ?? () from /usr/lib/libpython3.11.so.1.0
#20 0x00007ffff7c6a628 in ?? () from /usr/lib/libpython3.11.so.1.0
#21 0x00007ffff787208f in ?? () from /usr/lib/libc.so.6
#22 0x00007ffff79044b0 in ?? () from /usr/lib/libc.so.6
I poked a bit at this this morning, tried increasing the copy ctx size slightly but that doesn't seem to be the issue. It does seem like the new tensor and copy operation are not required for the functionality of llama_copy/set_state_data and can be skipped for that operation, if that's somehow an option. Commenting out the new lines avoids the prompt cache segfault (at the cost of the Metal functionality I assume).
@ejones What lines did you comment out, exactly? I would be nice to work around this, since the commit is required for the new quant functionality.
This seems to get the prompt cache working at least:
diff --git a/ggml.c b/ggml.c
index 34212b8..62ac19f 100644
--- a/ggml.c
+++ b/ggml.c
@@ -5975,12 +5975,12 @@ struct ggml_tensor * ggml_view_3d(
struct ggml_tensor * result = ggml_new_tensor_impl(ctx, a->type, 3, ne, (char *) a->data + offset);
- ggml_scratch_save(ctx);
+ // ggml_scratch_save(ctx);
- struct ggml_tensor * offs = ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 2);
- memcpy(offs->data, &offset, 2*sizeof(int32_t));
+ // struct ggml_tensor * offs = ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 2);
+ // memcpy(offs->data, &offset, 2*sizeof(int32_t));
- ggml_scratch_load(ctx);
+ // ggml_scratch_load(ctx);
result->nb[1] = nb1;
result->nb[2] = nb2;
@@ -5990,7 +5990,7 @@ struct ggml_tensor * ggml_view_3d(
result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL;
result->src0 = a;
result->src1 = NULL;
- result->opt[0] = offs;
+ // result->opt[0] = offs;
if (is_node) {
memcpy(result->padding, &offset, sizeof(offset));
I broke this with the Metal changes.
The "view" operators now create a opt tensors to store the offset, but this logic breaks when the context is created with no_alloc == true. Will figure out a fix now