llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

[User] Segfault when saving session cache since ecb217d

Open sgentle opened this issue 2 years ago • 6 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [x] I carefully followed the README.md.
  • [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

No segfault when saving session cache since ecb217d

Current Behavior

Segfault when saving session cache since ecb217d

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

  • Physical (or virtual) hardware you are using, e.g. for Linux:
$ system_profiler SPHardwareDataType
Hardware:

    Hardware Overview:

      Model Name: Mac Studio
      Model Identifier: Mac13,1
      Model Number: Z14J000LLX/A
      Chip: Apple M1 Max
      Total Number of Cores: 10 (8 performance and 2 efficiency)
      Memory: 64 GB
      System Firmware Version: 8422.100.650
      OS Loader Version: 8422.100.650
  • Operating System, e.g. for Linux:
$ sw_vers
ProductName:		macOS
ProductVersion:		13.3.1
ProductVersionExtra:	(a)
BuildVersion:		22E772610a

$ uname -a
Darwin workstation.local 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000 arm64
  • SDK version, e.g. for Linux:
$ python3 --version
Python 3.11.3

$ make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0

$ g++ --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Failure Information (for bugs)

Segfault when saving session cache since ecb217d

Steps to Reproduce

  1. $ git checkout ecb217d
  2. $ make clean; make
  3. $ rm -f /tmp/prompt.cache; ./main -m ./models/7B/ggml-model-q4_0.bin -n 50 -s 0 -p "Top 10 cat memes:" --prompt-cache "/tmp/prompt.cache"

Failure Logs

$ rm -f /tmp/prompt.cache; ./main -m ./models/7B/ggml-model-q4_0.bin -n 50 -s 0 -p "Top 10 cat memes:" --prompt-cache "/tmp/prompt.cache"
main: build = 612 (ecb217d)
main: seed  = 0
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 1932.71 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: attempting to load saved session from '/tmp/prompt.cache'
main: session file does not exist, will create
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 50, n_keep = 0


 Top 10 cat memes:fish: Job 1, './main -m ./models/7B/ggml-mode…' terminated by signal SIGSEGV (Address boundary error)

Backtrace after building with LLAMA_DEBUG=1:

$ rm -f /tmp/prompt.cache; lldb -b -o 'run' -k 'bt' -- ./main -m ./models/7B/ggml-model-q4_0.bin -n 50 -s 0 -p "Top 10 cat memes:" --prompt-cache "/tmp/prompt.cache"
(lldb) target create "./main"
Current executable set to '/tmp/llama.cpp/main' (arm64).
(lldb) settings set -- target.run-args  "-m" "./models/7B/ggml-model-q4_0.bin" "-n" "50" "-s" "0" "-p" "Top 10 cat memes:" "--prompt-cache" "/tmp/prompt.cache"
(lldb) run
main: build = 612 (ecb217d)
main: seed  = 0
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 1932.71 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: attempting to load saved session from '/tmp/prompt.cache'
main: session file does not exist, will create
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 50, n_keep = 0


 Top 10 cat memes:GGML_ASSERT: ggml.c:3986: ((uintptr_t) (ctx->mem_buffer))%GGML_MEM_ALIGN == 0
Process 22227 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x000000018d520724 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`:
->  0x18d520724 <+8>:  b.lo   0x18d520744               ; <+40>
    0x18d520728 <+12>: pacibsp
    0x18d52072c <+16>: stp    x29, x30, [sp, #-0x10]!
    0x18d520730 <+20>: mov    x29, sp
Target 0: (main) stopped.
Process 22227 launched: '/tmp/llama.cpp/main' (arm64)
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x000000018d520724 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000018d557c28 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x000000018d465ae8 libsystem_c.dylib`abort + 180
    frame #3: 0x0000000100010f34 main`ggml_init(params=(mem_size = 4096, mem_buffer = 0x000000016fdebd98, no_alloc = true)) at ggml.c:3986:5
    frame #4: 0x000000010004d174 main`::llama_copy_state_data(ctx=0x0000000101009c00, dst=" \U0000001a") at llama.cpp:2739:38
    frame #5: 0x000000010004e8d4 main`::llama_save_session_file(ctx=0x0000000101009c00, path_session="/tmp/prompt.cache", tokens=0x0000600000d7f660, n_token_count=9) at llama.cpp:2956:41
    frame #6: 0x0000000100003aac main`main(argc=11, argv=0x000000016fdfe6c0) at main.cpp:422:17
    frame #7: 0x000000018d1fff28 dyld`start + 2236

sgentle avatar Jun 05 '23 09:06 sgentle

Seems to be the ggml_view_3d calls.

https://github.com/ggerganov/llama.cpp/blob/5220a991a5e92bddad9542267ab445a2c033681c/llama.cpp#LL2759-L2765

And, more specifically, this memcpy call:

https://github.com/ggerganov/llama.cpp/blob/5220a991a5e92bddad9542267ab445a2c033681c/ggml.c#L5901

colinc avatar Jun 05 '23 12:06 colinc

I am seeing this as well.

spencekim avatar Jun 06 '23 07:06 spencekim

Doesn't work for me too.

Xarbirus avatar Jun 06 '23 08:06 Xarbirus

Happens to me as well, I'm on windows using latest avx2 release.

Faulting application name: main.exe, version: 0.0.0.0, time stamp: 0x647ede4b
Faulting module name: main.exe, version: 0.0.0.0, time stamp: 0x647ede4b
Exception code: 0xc0000005
Fault offset: 0x000000000003fc81
Faulting process id: 0x73fc
Faulting application start time: 0x01d9987bac3a2a5d
Faulting application path: C:\Syahmi\Devs\llama.cpp\main.exe
Faulting module path: C:\Syahmi\Devs\llama.cpp\main.exe
Report Id: 8d6237af-917a-472e-b99e-37118f6ad500
Faulting package full name: 
Faulting package-relative application ID: 

I have checked the crash, same place as @colinc Crash at ggml_view_3d calls due to offs->data is NULL

prsyahmi avatar Jun 06 '23 13:06 prsyahmi

Happening to me in linux as well, I think, via libllama.so from llama-cpp-python.

...
lama_print_timings:        load time =  9348.15 ms
llama_print_timings:      sample time =   129.86 ms /    26 runs   (    4.99 ms per token)
llama_print_timings: prompt eval time =  9348.02 ms /    16 tokens (  584.25 ms per token)
llama_print_timings:        eval time = 16120.10 ms /    25 runs   (  644.80 ms per token)
llama_print_timings:       total time = 25652.71 ms
Llama._create_completion: cache save

Thread 17 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe019c36c0 (LWP 13731)]
0x00007ffe30eca776 in ggml_view_3d (ctx=0x7ffe30feb2c8 <g_state+200>, a=0x7ffafa000020, ne0=5120,
    ne1=41, ne2=40, nb1=10240, nb2=20480000, offset=0) at ggml.c:5972
5972	    memcpy(offs->data, &offset, 2*sizeof(int32_t));
(gdb) backtrace
#0  0x00007ffe30eca776 in ggml_view_3d (ctx=0x7ffe30feb2c8 <g_state+200>, a=0x7ffafa000020, ne0=5120,
    ne1=41, ne2=40, nb1=10240, nb2=20480000, offset=0) at ggml.c:5972
#1  0x00007ffe30e8d170 in llama_copy_state_data (ctx=0xc2a95f0, dst=0x7ff9c8350010 "$\032")
    at llama.cpp:2951
#2  0x00007ffff5a8ce3e in ?? () from /usr/lib/libffi.so.8
#3  0x00007ffff5a890ef in ?? () from /usr/lib/libffi.so.8
#4  0x00007ffff5a8c2c3 in ffi_call () from /usr/lib/libffi.so.8
#5  0x00007ffff59ed99b in ?? ()
   from /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so
#6  0x00007ffff59ec854 in ?? ()
   from /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so
#7  0x00007ffff7b3cb94 in ?? () from /usr/lib/libpython3.11.so.1.0
#8  0x00007ffff7b4aef6 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.11.so.1.0
#9  0x00007ffff7ba2227 in ?? () from /usr/lib/libpython3.11.so.1.0
#10 0x00007ffff7b4acd8 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.11.so.1.0
#11 0x00007ffff7ba6f83 in ?? () from /usr/lib/libpython3.11.so.1.0
#12 0x00007ffff7ba6acb in ?? () from /usr/lib/libpython3.11.so.1.0
#13 0x00007ffff7b8b3aa in PyObject_Call () from /usr/lib/libpython3.11.so.1.0
#14 0x00007ffff7b4f721 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.11.so.1.0
#15 0x00007ffff7b7e960 in _PyFunction_Vectorcall () from /usr/lib/libpython3.11.so.1.0
#16 0x00007ffff7b4f721 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.11.so.1.0
#17 0x00007ffff7ba6f83 in ?? () from /usr/lib/libpython3.11.so.1.0
#18 0x00007ffff7ba6b08 in ?? () from /usr/lib/libpython3.11.so.1.0
#19 0x00007ffff7ca7a80 in ?? () from /usr/lib/libpython3.11.so.1.0
#20 0x00007ffff7c6a628 in ?? () from /usr/lib/libpython3.11.so.1.0
#21 0x00007ffff787208f in ?? () from /usr/lib/libc.so.6
#22 0x00007ffff79044b0 in ?? () from /usr/lib/libc.so.6

AlphaAtlas avatar Jun 07 '23 07:06 AlphaAtlas

I poked a bit at this this morning, tried increasing the copy ctx size slightly but that doesn't seem to be the issue. It does seem like the new tensor and copy operation are not required for the functionality of llama_copy/set_state_data and can be skipped for that operation, if that's somehow an option. Commenting out the new lines avoids the prompt cache segfault (at the cost of the Metal functionality I assume).

ejones avatar Jun 07 '23 16:06 ejones

@ejones What lines did you comment out, exactly? I would be nice to work around this, since the commit is required for the new quant functionality.

AlphaAtlas avatar Jun 08 '23 18:06 AlphaAtlas

This seems to get the prompt cache working at least:

diff --git a/ggml.c b/ggml.c
index 34212b8..62ac19f 100644
--- a/ggml.c
+++ b/ggml.c
@@ -5975,12 +5975,12 @@ struct ggml_tensor * ggml_view_3d(
 
     struct ggml_tensor * result = ggml_new_tensor_impl(ctx, a->type, 3, ne, (char *) a->data + offset);
 
-    ggml_scratch_save(ctx);
+    // ggml_scratch_save(ctx);
 
-    struct ggml_tensor * offs = ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 2);
-    memcpy(offs->data, &offset, 2*sizeof(int32_t));
+    // struct ggml_tensor * offs = ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 2);
+    // memcpy(offs->data, &offset, 2*sizeof(int32_t));
 
-    ggml_scratch_load(ctx);
+    // ggml_scratch_load(ctx);
 
     result->nb[1] = nb1;
     result->nb[2] = nb2;
@@ -5990,7 +5990,7 @@ struct ggml_tensor * ggml_view_3d(
     result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL;
     result->src0 = a;
     result->src1 = NULL;
-    result->opt[0] = offs;
+    // result->opt[0] = offs;
 
     if (is_node) {
         memcpy(result->padding, &offset, sizeof(offset));

ejones avatar Jun 08 '23 18:06 ejones

I broke this with the Metal changes. The "view" operators now create a opt tensors to store the offset, but this logic breaks when the context is created with no_alloc == true. Will figure out a fix now

ggerganov avatar Jun 10 '23 08:06 ggerganov