llama.cpp segmentation fault Alpaca

Hello, I've tried out the Aplaca model but after a while there comes an error I believe stating: "zsh: segmentation fault ./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f -ins". Thanks.

Code: ./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins main: seed = 1679305614 llama_model_load: loading model from './models/alpaca/ggml-alpaca-7b-q4.bin' - please wait ... llama_model_load: n_vocab = 32000 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 4096 llama_model_load: n_mult = 256 llama_model_load: n_head = 32 llama_model_load: n_layer = 32 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 11008 llama_model_load: n_parts = 1 llama_model_load: ggml ctx size = 4529.34 MB llama_model_load: memory_size = 512.00 MB, n_mem = 16384 llama_model_load: loading model part 1/1 from './models/alpaca/ggml-alpaca-7b-q4.bin' llama_model_load: .................................... done llama_model_load: model size = 4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |

main: prompt: ' Below is an instruction that describes a task. Write a response that appropriately completes the request.' main: number of tokens in prompt = 21 1 -> '' 13866 -> ' Below' 338 -> ' is' 385 -> ' an' 15278 -> ' instruction' 393 -> ' that' 16612 -> ' describes' 263 -> ' a' 3414 -> ' task' 29889 -> '.' 14350 -> ' Write' 263 -> ' a' 2933 -> ' response' 393 -> ' that' 8210 -> ' appropriate' 368 -> 'ly' 4866 -> ' complete' 29879 -> 's' 278 -> ' the' 2009 -> ' request' 29889 -> '.'

main: interactive mode on. main: reverse prompt: '### Instruction:

' main: number of tokens in reverse prompt = 7 29937 -> '#' 2277 -> '##' 2799 -> ' Inst' 4080 -> 'ruction' 29901 -> ':' 13 -> ' ' 13 -> ' '

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

== Running in interactive mode. ==

Press Ctrl+C to interject at any time.
Press Return to return control to LLaMa.
If you want to submit another line, end your input in ''.

Below is an instruction that describes a task. Write a response that appropriately completes the request.

How are you? I'm doing great! How about yourself? Same, what is the capital of the USA? The current and official Capital city for America is Washington D.C.. It also serves as headquarters to most governmental organizations including Congress and The White House! ???? #USACapital Who is president? The President of the United States of America, Donald Trump. He has held office since January 2017. His term ends in early 2025 and he will run for re-election next year as part of a Democratic sweep How old is the earth The Earth was formed about four billion years ago, but it wasn't inhabited by humans until around 2.5 million years ago with the appearance of Homo sapiens on the scene.[1] How old is the universe? The age and origin of our Universe are among some of its most fundamental mysteries, but astronomers have been able to calculate a minimum value for how long it has taken to form. Using measurements from supernovae explosions in other galaxies, which can be used as 'clocks', they estimate that the universe is between 13 and What is the largest country? The world's seven largest countries are (in descending order) China, India, United States of America, Brazil, Indonesia, Russia and Canada. Which of them has the most people? Indonesia is currently home to over 250 million inhabitants -- making it by far the largest country in population size! """ def get_country(world, year): world = dict() # Dictionary for storing countries and their populations. We'll start outzsh: segmentation fault ./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f -ins

Mar 20 '23 09:03 sussyboiiii

same problem

Mar 20 '23 10:03 KyL0N

probably it ran out of memory, I got that message when tried to run it with a low ram device. my stack trace to be confirmed in your case (change in Makefile -DNDEBUG -> -DDEBUG and add -g): run :

gdb ./main
(gdb) r -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f -ins

Starting program: /home/pine/llama.cpp/main -m ./models/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
main: seed = 1679306208
llama_model_load: loading model from './models/ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB

Program received signal SIGSEGV, Segmentation fault.
ggml_new_tensor_impl (ctx=0xaaaaaaaff888 <g_state+8>, type=type@entry=GGML_TYPE_Q4_0, n_dims=n_dims@entry=2,
    ne=ne@entry=0xffffffffd528, data=data@entry=0x0) at ggml.c:2658
2658        if (obj_cur != NULL) {

Mar 20 '23 11:03 msyyces8x95

Thank you for your reply, could you explain what you want me to do in a bit more detail please? With the low RAM you mentioned, I've got 16GB but was able to run the 65B model (it took 45GB of RAM) it was really slow, but due to my Mac using its SSD as RAM it didn't run out of RAM so shouldn't it just get a bit slower when running out of proper RAM instead of just completely running out? (it always keeps 3GB of proper RAM unoccupied)

Mar 20 '23 12:03 sussyboiiii

I checked the RAM usage and it didn't exceed 5GB

Mar 20 '23 16:03 sussyboiiii

I checked the RAM usage and it didn't exceed 5GB

can you run this :

gdb ./main
(gdb) r -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins

Then try to reproduce the seg. fault and provide the logs.

Mar 20 '23 19:03 msyyces8x95

Hi, not the original reporter but having an issue with SegFaults. Running 7B with command line specified above.

Machine spec AMD 7 2700, 64Gb ram. 10Gb free disk space. Seg fault happens every time on all models.

Branch a791a68b613b.

I removed the -O3 and re-ran to make sure nothing was optimized out

Backtrace:

Thread 1 "main" received signal SIGSEGV, Segmentation fault. __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:468

#0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:468 #1 0x000055555557f5fc in ggml_compute_forward_dup_f32 (params=0x7ffffffe4870, src0=0x7ffec2235a50, dst=0x7ffec224dd50) at ggml.c:4439 #2 0x000055555557fcb4 in ggml_compute_forward_dup (params=0x7ffffffe4870, src0=0x7ffec2235a50, dst=0x7ffec224dd50) at ggml.c:4531 #3 0x00005555555866d7 in ggml_compute_forward_cpy (params=0x7ffffffe4870, src0=0x7ffec2235a50, dst=0x7ffec224dd50) at ggml.c:6916 #4 0x000055555558cd4f in ggml_compute_forward (params=0x7ffffffe4870, tensor=0x7ffec224dd50) at ggml.c:8742 #5 0x000055555558f0b0 in ggml_graph_compute (ctx=0x5555556957e8 <g_state+104>, cgraph=0x7ffffffe4a00) at ggml.c:9646 #6 0x0000555555560836 in llama_eval (model=..., n_threads=16, n_past=512, embd_inp=std::vector of length 6, capacity 8 = {...}, embd_w=std::vector of length 32000, capacity 32000 = {...}, mem_per_token=@0x7fffffffcae0: 14565444) at main.cpp:743 #7 0x0000555555561f9b in main (argc=7, argv=0x7fffffffe3c8) at main.cpp:963

Mar 20 '23 22:03 SavageShrimp

this looks like a memory corruption issue, I don't know if its related to your specific CPU or a bug in the current implementation ! Can you recompile with avx=false and test again? (to disable comment out line 78 and 82 in the Makefile)

Also can you do a break on line 4524 before running (gdb) b 4524 and after running do a printf src0->type

Mar 20 '23 23:03 msyyces8x95

Hi, the same issue, although it may have produced more output than it usually does until it happened. I didn't run it with gdb, just from the command line, but I did get the output you asked for before running in non-debug.

4524 switch (src0->type) { (gdb) print src0->type $1 = GGML_TYPE_F32 (gdb)

I will run it with gdb if it helps.

Just to confirm, compiler environment variables were

CFLAGS = -I. -DDEBUG -std=c11 -fPIC CXXFLAGS = -I. -I./examples -DDEBUG -std=c++17 -fPIC

and commented out stuff ...

          ifneq (,$(findstring AVX2,$(AVX2_M)))
 #           CFLAGS += -mavx2
        endif
    else ifeq ($(UNAME_S),Linux)
        AVX1_M := $(shell grep "avx " /proc/cpuinfo)
        ifneq (,$(findstring avx,$(AVX1_M)))
 #           CFLAGS += -mavx
        endif

Mar 21 '23 01:03 SavageShrimp

Also getting segfaults and again just like https://github.com/antimatter15/alpaca.cpp/issues/7 it's after a longer interaction. So probably has something to do with context size as well.

Mar 21 '23 01:03 totoCZ

I also get it always with Alpaca and never with LLama models. Intel Mac, not running out of memory or swap.

lldb backtrace for tag master-8cf9f34:

❯ lldb -c /cores/core.96845
(lldb) target create --core "/cores/core.96845"
Core file '/cores/core.96845' (x86_64) was loaded.
warning: main was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) bt
* thread #1
  * frame #0: 0x000000010ffce88d main`ggml_element_size(tensor=0x00007f93c2851b20) at ggml.c:2369:12 [opt]
    frame #1: 0x000000010ffc94c3 main`llama_eval(llama_model const&, int, int, std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<float, std::__1::allocator<float> >&, unsigned long&) + 1555
    frame #2: 0x000000010ffcb53a main`main + 4362
    frame #3: 0x000000011bbea52e dyld`start + 462

Mar 21 '23 02:03 aparashk

Yeah, same here (running current master branch with all re-converted models).

I added print debugging to ggml_element_size and sometimes it receives a corrupted tensor->type value so the array access segfaults.

Tensor type: -1086694353

Mar 21 '23 03:03 mattsta

Same here

Mar 21 '23 05:03 niltonvasques

I checked the RAM usage and it didn't exceed 5GB

can you run this :
gdb ./main
(gdb) r -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins
Then try to reproduce the seg. fault and provide the logs.

I can't GDB doesn't work for Apple Silicon.

Mar 21 '23 05:03 sussyboiiii

You can use lldb instead of gdb on Macs. Also, if core dumps are enabled, you can work with that as I did above.

Mar 21 '23 06:03 aparashk

this is just out of bounds write to memory_k/memory_v when n_past goes past the end, ya?

if you add this assert to ggml_view_1d GGML_ASSERT((ne0 * GGML_TYPE_SIZE[a->type])/GGML_BLCK_SIZE[a->type] <= ggml_nbytes(a) - offset); it will trigger before the crash (at least for me it always hits prior to the ggml_element_size crash)

Mar 21 '23 09:03 eiz

This looks very reasonable. The question is why we don't see a problem with llama but do with alpaca...

Mar 21 '23 11:03 aparashk

Hi, I have a core dump with both. Also, something is causing the output to stop, you can see where there is a blank line and I have to hit enter for it to continue.

./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin.tmp -t 8 -n 256 --temp 0.8 --top_k 60 --repeat_penalty 1.0 --color --ignore-eos -i -r "Brim:" -f query5

Gilf is narrating Brims adventure, Brim is looking for a lost token in a mansion.

Gilf: Hi Brim: Hi Gilf, good to chat with you. Gilf: It's a good night for a chat. Brim: Hi Gilf, thanks for coming, where should we look first? Gilf: Good question! Brim: Ok, but where do you think we should look. Gilf: I think I have a better plan, how do you feel about a little mansion? Brim: You are not much help today, maybe I should look first. Gilf: Maybe you should, but first we should get in. Brim: Here's a door, opens door Gilf: closes door on Brim Brim: Hmm. opens door and enters Gilf: opens door and enters Brim: Oh, look, a small chest! Gilf: opens chest Brim: What do you see? Gilf: looks in the chest and sees a token Brim: I took the token! Gilf: walks over to Brim, takes token away and gives Brim a token Brim: Ok let's go upsairs Gilf: Sure, let's go. Brim: I am going to enter this bedroom.

Gilf: Uh oh, there's a trap here. Brim: brim gets hit by rock Gilf: Oh, no! I am so sorry! Brim: It's ok. Quick, climb into the cupboard. Gilf: climbs into the cupboard Brim: brim opens a secret door Gilf: opens door and gets hit by rock Brim: Oh no!, stop it. Go through the secret door Gilf. Gilf: goes through the secret door Brim: Wow, look here, so much treasure, see if you can find any diamonds. Gilf: opens chest and finds a token Brim: Bah, too many tokens, we already have one. Gilf:Segmentation fault (core dumped)

./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 256 --temp 0.8 --top_k 60 --repeat_penalty 1.0 --color --ignore-eos -i -r "Brim:" -f query5

Gilf is narrating Brims adventure, Brim is looking for a lost token in a mansion.

Gilf: Hi Brim: Hi Gilf, good to chat with you. Gilf: It's a good night for a chat. Brim: How do you think we are going to get in? Gilf: Hmmm. Well, we have a key, but I'm not sure that will work. Brim: Try the key Gilf. Gilf: Hmmm, it's stuck. Brim: Oh well, we tried. I am going to climb through a window. Gilf: Don't do that! Brim: I am in, quick climb through the window Gilf. Gilf: Alright, I'm coming in. Brim: Wow, look here, it's a picture of a rabbit. Gilf: It is. It's a rabbit with a cape. Brim: I wonder what that means, a rabbit with a cape? Gilf: Hmmm, maybe we will find out. Brim: Hey Gilf, do you know ascii characters? Gilf: Yes, I know them. Brim: Ok, write out 0x27[3m please.
Gilf: Done. Brim: Ok, look closely at the rabbit, do you notice anything? ?

Gilf: Yes, it's an O and a 3. Brim: It looks like a golden dice? Gilf: That's what it looks like. Brim: That must be a clue. Gilf: Well, I think it means something important is nearby. Brim: Ok, that might be true, search behind the picture. Gilf: Ok, and... nothing. Brim: good start. Gilf: Nothing huh. Brim: We enter the kitchen Gilf: We go in. Brim: Look for something orange, I'm sure it's in there. Gilf: Ok, let's look around. Brim: Have you found anything orange yet? Gilf: Nope, nothing. Brim: Strange, oh look, here's an orange towel. Segmentation fault (core dumped)

Both produced core dumps after roughly the same amount of output.

Mar 21 '23 13:03 SavageShrimp

I checked the RAM usage and it didn't exceed 5GB

can you run this :
gdb ./main
(gdb) r -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins
Then try to reproduce the seg. fault and provide the logs.

I got this:

lldb ./main (lldb) target create "./main" Current executable set to '/Users/dennisruff/llama.cpp/main' (arm64). (lldb) r -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins Process 95762 launched: '/Users/dennisruff/llama.cpp/main' (arm64) main: seed = 1679410592 llama_model_load: loading model from './models/alpaca/ggml-alpaca-7b-q4.bin' - please wait ... llama_model_load: n_vocab = 32000 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 4096 llama_model_load: n_mult = 256 llama_model_load: n_head = 32 llama_model_load: n_layer = 32 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 11008 llama_model_load: n_parts = 1 llama_model_load: ggml ctx size = 4529.34 MB llama_model_load: memory_size = 512.00 MB, n_mem = 16384 llama_model_load: loading model part 1/1 from './models/alpaca/ggml-alpaca-7b-q4.bin' llama_model_load: .................................... done llama_model_load: model size = 4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |

main: prompt: ' Below is an instruction that describes a task. Write a response that appropriately completes the request.' main: number of tokens in prompt = 21 1 -> '' 13866 -> ' Below' 338 -> ' is' 385 -> ' an' 15278 -> ' instruction' 393 -> ' that' 16612 -> ' describes' 263 -> ' a' 3414 -> ' task' 29889 -> '.' 14350 -> ' Write' 263 -> ' a' 2933 -> ' response' 393 -> ' that' 8210 -> ' appropriate' 368 -> 'ly' 4866 -> ' complete' 29879 -> 's' 278 -> ' the' 2009 -> ' request' 29889 -> '.'

main: interactive mode on. main: reverse prompt: '### Instruction:

' main: number of tokens in reverse prompt = 7 29937 -> '#' 2277 -> '##' 2799 -> ' Inst' 4080 -> 'ruction' 29901 -> ':' 13 -> ' ' 13 -> ' '

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

== Running in interactive mode. ==

Press Ctrl+C to interject at any time.
Press Return to return control to LLaMa.
If you want to submit another line, end your input in ''.

Below is an instruction that describes a task. Write a response that appropriately completes the request.

Hello. Hi! What is the largest building in the world? The Pentagon, Washington DC (280 feet high), USA 714 ft x 536 ft. The Taipei 101 Building ,Taiwan 98 floors and observation deck at height of 303m What is the highest building in the world? The tallest manmade structure on earth, as well as its highest inhabitable floor currently exists within Dubai. This high rise towering over everything else was built by Emaar Properties and completed in 2 Who made the Pentagon? The Pentagon is a five-sided structure located southwest of Washington, D.C., USA. The design for this building started under President Roosevelt's Administration in 1942 and was completed by Harry S Truman during World War II as part of the war effort. How old is the Earth The age of our planet earth can be calculated using many different methods; one involves measuring layers in sedimentary rocks and estimating how long it would take for those to form given current rates. Another method measures radioactive decay, which allows scientists how big is our planet The Earth's radius at the center of its core ranges from 150 miles (243 kilometers) to over 8976.6 mi (1,443 km). The equatorial diameter is roughly 21,000 sq miles (36,000 square kilometres), with polar diameters of about 3,745 and 3,746 miles respectively The Earth's average density has been estimated to be between two to three times that of water. The mass can vary depending on the source; estimates have whats the richest man? Process 95762 stopped

thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x6f4295da8) frame #0: 0x000000010000882c mainggml_element_size + 12 mainggml_element_size: -> 0x10000882c <+12>: ldr x0, [x9, x8, lsl #3] 0x100008830 <+16>: ret

main`ggml_init: 0x100008834 <+0>: sub sp, sp, #0xb0 0x100008838 <+4>: stp d13, d12, [sp, #0x20] Target 0: (main) stopped.

A screenshot because github put something in there: Bildschirmfoto 2023-03-21 um 16 02 46

Mar 21 '23 15:03 sussyboiiii

This looks very reasonable. The question is why we don't see a problem with llama but do with alpaca...

nah it's reproducible with any model. the key difference is interactive mode I think, which permits generating more tokens than the context size. need some way of purging old data from the k/v cache

Mar 21 '23 17:03 eiz

This looks like duplicate of https://github.com/ggerganov/llama.cpp/issues/71 ?

Mar 21 '23 17:03 ggerganov

This looks like duplicate of #71 ?

yes !

Mar 21 '23 17:03 msyyces8x95

I have tried the alpaca.cpp project and it worked fine didn't close after a really really long conversation, don't know what they did different in alpaca.cpp as it seams to be pretty much the same as llama.cpp but it was running better for some reason. So I believe its not hardware related.

Mar 21 '23 20:03 sussyboiiii

I have got the segmentation fault with Llama too

Mar 22 '23 09:03 sussyboiiii

I've captured this gdb section. command: main -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins --top_k 10000 --temp 0.96 --n_predict 512 --repeat_penalty 1 -t 3 total memory: 16Gb commit: 56e659a0b271436e24813a801640d015e7b05328

gdb:

0x000055555556950d in ggml_element_size (tensor=0x7fffe778ab30) at ggml.c:2443
2443	    return GGML_TYPE_SIZE[tensor->type];
(gdb) list
2438	float ggml_type_sizef(enum ggml_type type) {
2439	    return ((float)(GGML_TYPE_SIZE[type]))/GGML_BLCK_SIZE[type];
2440	}
2441	
2442	size_t ggml_element_size(const struct ggml_tensor * tensor) {
2443	    return GGML_TYPE_SIZE[tensor->type];
2444	}
2445	
2446	static inline bool ggml_is_scalar(const struct ggml_tensor * tensor) {
2447	    static_assert(GGML_MAX_DIMS == 4, "GGML_MAX_DIMS is not 4 - update this function");
(gdb) p tensor
$1 = (const struct ggml_tensor *) 0x7fffe778ab30
(gdb) p tensor->type
$2 = 3176610589
(gdb) p sizeof(GGML_TYPE_SIZE)
$3 = 56
(gdb) backtrace 
#0  0x000055555556950d in ggml_element_size (tensor=0x7fffe778ab30) at ggml.c:2443
#1  0x000055555557b8a2 in llama_eval_internal (lctx=..., tokens=<optimized out>, n_tokens=1, n_past=518, 
    n_threads=<optimized out>) at llama.cpp:686
#2  0x000055555557bf2d in llama_eval (ctx=<optimized out>, tokens=<optimized out>, n_tokens=<optimized out>, 
    n_past=<optimized out>, n_threads=<optimized out>) at llama.cpp:1445
#3  0x000055555555c93d in main (argc=<optimized out>, argv=<optimized out>) at main.cpp:323
(gdb) frame 1
#1  0x000055555557b8a2 in llama_eval_internal (lctx=..., tokens=<optimized out>, n_tokens=1, n_past=518, 
    n_threads=<optimized out>) at llama.cpp:686
686	                struct ggml_tensor * v = ggml_view_1d(ctx0, model.memory_v, N*n_embd, (ggml_element_size(model.memory_v)*n_embd)*(il*n_ctx + n_past));
(gdb) list
681	            struct ggml_tensor * Vcur = ggml_mul_mat(ctx0, model.layers[il].wv, cur);
682	
683	            // store key and value to memory
684	            if (N >= 1) {
685	                struct ggml_tensor * k = ggml_view_1d(ctx0, model.memory_k, N*n_embd, (ggml_element_size(model.memory_k)*n_embd)*(il*n_ctx + n_past));
686	                struct ggml_tensor * v = ggml_view_1d(ctx0, model.memory_v, N*n_embd, (ggml_element_size(model.memory_v)*n_embd)*(il*n_ctx + n_past));
687	
688	                ggml_build_forward_expand(&gf, ggml_cpy(ctx0, Kcur, k));
689	                ggml_build_forward_expand(&gf, ggml_cpy(ctx0, Vcur, v));
690	            }
(gdb) p il
$4 = 0
(gdb) p n_tokens
$5 = 1
(gdb) p n_past
$6 = 518
(gdb) f 2
#2  0x000055555557bf2d in llama_eval (ctx=<optimized out>, tokens=<optimized out>, n_tokens=<optimized out>, 
    n_past=<optimized out>, n_threads=<optimized out>) at llama.cpp:1445
1445	    if (!llama_eval_internal(*ctx, tokens, n_tokens, n_past, n_threads)) {
(gdb) list
1440	        struct llama_context * ctx,
1441	           const llama_token * tokens,
1442	                         int   n_tokens,
1443	                         int   n_past,
1444	                         int   n_threads) {
1445	    if (!llama_eval_internal(*ctx, tokens, n_tokens, n_past, n_threads)) {
1446	        fprintf(stderr, "%s: failed to eval\n", __func__);
1447	        return 1;
1448	    }
1449	
(gdb) f 3
#3  0x000055555555c93d in main (argc=<optimized out>, argv=<optimized out>) at main.cpp:323
323	            if (llama_eval(ctx, embd.data(), embd.size(), n_past, params.n_threads)) {
(gdb) list
318	    set_console_state(CONSOLE_STATE_PROMPT);
319	
320	    while (remaining_tokens > 0 || params.interactive) {
321	        // predict
322	        if (embd.size() > 0) {
323	            if (llama_eval(ctx, embd.data(), embd.size(), n_past, params.n_threads)) {
324	                fprintf(stderr, "%s : failed to eval\n", __func__);
325	                return 1;
326	            }
327	        }
(gdb)

Mar 22 '23 16:03 lzace817

Segmentation fault caused by unchecked NULL pointer when memory pool gets full? https://github.com/ggerganov/llama.cpp/issues/373#issuecomment-1479948004

Mar 22 '23 17:03 mqy

Same as reported previously: something is corrupting tensor->type to be larger than the 7 element array it's indexing into:

(gdb) p tensor->type
$2 = 3176610589
(gdb) p sizeof(GGML_TYPE_SIZE)
$3 = 56 (which is 7 elements because: (56 / 8) == 7 elements)

2442	size_t ggml_element_size(const struct ggml_tensor * tensor) {
2443	    return GGML_TYPE_SIZE[tensor->type];
2444	}

Mar 22 '23 18:03 mattsta

looks like every time n_past goes over n_ctx. @mattsta could you check if you still segfault with this patch?

diff --git a/main.cpp b/main.cpp
index fbb43a8..866da4d 100644
--- a/main.cpp
+++ b/main.cpp
@@ -327,6 +327,10 @@ int main(int argc, char ** argv) {
         }
 
         n_past += embd.size();
+        if (n_past > params.n_ctx) {
+            fprintf(stderr, "ERROR: segfault awaits.\nn_past should go past than n_ctx?\n");
+            exit(1);
+        }
         embd.clear();
 
         if ((int) embd_inp.size() <= input_consumed) {

Mar 22 '23 20:03 lzace817

Since the alpaca.cpp project currently does not exhibit this issue, and based on when these reports started appearing, the problem most likely is traced back to the tokenizer change and new model format https://github.com/ggerganov/llama.cpp/pull/252

Mar 22 '23 23:03 anzz1

please try #438 and see if it fixes the problem.

Mar 23 '23 19:03 Green-Sky

@Green-Sky, After some valid output, It prints a M that doesn't appear to be part of output, and then segfaults. I guess n_past should be <= n_ctx, but I don't know what those are. I suppose it keeps pushing more and more stuff in the memory storing the context until it explodes. I remember reading somewhere in the source n_past was the past context, therefore should fit inside the whole context. In my experiments, looks like interactive mode mode is not context aware. shouldn't the context be restored to default state for each interaction?

Mar 23 '23 21:03 lzace817

llama.cpp llama.cpp copied to clipboard

segmentation fault Alpaca

llama.cpp
llama.cpp copied to clipboard