llama.cpp Baby-llama.cpp report bus error

System: MacOS Ventura 13.2.1 CPU: M2 Pro

Reproduction process

git clone llama.cpp
cd llama.cpp
make
./baby-llama And then, the terminal will print:

init model
init_kv_cache
zsh: bus error  ./baby-llama

And similar error on my linux server

System: Ubuntu 22.04.2 LTS Architecture: x86_64 CPU(s): 128 Model name: Intel(R) Xeon(R) CPU @ 2.90GHz

Under the same steps, the terminal will print the following error：

init model
init_kv_cache
Floating point exception (core dumped)

Jan 09 '24 03:01 YangZyyyy

I have encountered the same issue while trying to run this example. I am trying to implement something similar to what is described in this issue. If no one else is on it, I'd like to try finding and implementing a fix.

Feb 16 '24 18:02 NawafAlansari

After further investigation, I've pinpointed the issue within the forward_batch function, specifically at line 988, which calls ggml_build_forward_expand(gf, inpL). It seems the root of the problem lies in the size of gf->visited_hash_table being zero at the time of this call.

Here's a breakdown of the call sequence leading to the exception:

ggml_build_forward_expand(gf, inpL) is invoked, passing the computation graph gf as an argument.
Inside ggml_build_forward_expand, the computation graph gf is passed to ggml_visit_parents.
ggml_visit_parents then passes the graph's hash table to ggml_hash_insert.
Finally, ggml_hash_insert calls ggml_hash_find, where the issue manifests. The problematic line in ggml_hash_find is the first one:

   size_t h = ggml_hash(key) % hash_set.size;

it seems that hash_set.size is zero, which results in a division by zero, leading to SIGFPE (Arithmetic exception). So it appears that the division by zero in ggml_hash_find is the main cause.

I'm currently exploring why gf->visited_hash_table's size is zero at this point in the execution and how we can ensure it's properly initialized before reaching this critical operation. Any further suggestions would be greatly appreciated.

Feb 16 '24 18:02 NawafAlansari

Okay I was able to fix the issue by just changing the lines that contain

gf = {}

in the main function to

struct ggml_cgraph * gf = NULL; 
gf = ggml_new_graph_custom(ctx0, LLAMA_TRAIN_MAX_NODES, true);

I am not sure this is the best way to go about it, but it works for now, any suggestions would be appreciated. I can also try to push the fix.

Feb 16 '24 19:02 NawafAlansari

@NawafAlansari That looks correct. When the baby-llama example was created, graphs had a fixed size and could be allocated in the stack, but that was changed a while ago and now graphs need to be allocated in a ggml_context (see https://github.com/ggerganov/ggml/issues/567 for more details). A PR to fix the example would be very welcome.

Feb 16 '24 22:02 slaren

@slaren I just made a PR with a fix for the example.

Feb 18 '24 22:02 NawafAlansari

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 04 '24 01:04 github-actions[bot]

llama.cpp llama.cpp copied to clipboard

Baby-llama.cpp report bus error

llama.cpp
llama.cpp copied to clipboard