llama.cpp
llama.cpp copied to clipboard
Baby-llama.cpp report bus error
System: MacOS Ventura 13.2.1 CPU: M2 Pro
Reproduction process
- git clone llama.cpp
-
cd llama.cpp
-
make
-
./baby-llama
And then, the terminal will print:
init model
init_kv_cache
zsh: bus error ./baby-llama
And similar error on my linux server
System: Ubuntu 22.04.2 LTS Architecture: x86_64 CPU(s): 128 Model name: Intel(R) Xeon(R) CPU @ 2.90GHz
Under the same steps, the terminal will print the following error:
init model
init_kv_cache
Floating point exception (core dumped)
I have encountered the same issue while trying to run this example. I am trying to implement something similar to what is described in this issue. If no one else is on it, I'd like to try finding and implementing a fix.
After further investigation, I've pinpointed the issue within the forward_batch
function, specifically at line 988, which calls ggml_build_forward_expand(gf, inpL)
. It seems the root of the problem lies in the size of gf->visited_hash_table
being zero at the time of this call.
Here's a breakdown of the call sequence leading to the exception:
-
ggml_build_forward_expand(gf, inpL)
is invoked, passing the computation graphgf
as an argument. - Inside
ggml_build_forward_expand
, the computation graphgf
is passed toggml_visit_parents
. -
ggml_visit_parents
then passes the graph's hash table toggml_hash_insert
. - Finally,
ggml_hash_insert
callsggml_hash_find
, where the issue manifests. The problematic line inggml_hash_find
is the first one:
size_t h = ggml_hash(key) % hash_set.size;
it seems that hash_set.size is zero, which results in a division by zero, leading to SIGFPE (Arithmetic exception). So it appears that the division by zero in ggml_hash_find is the main cause.
I'm currently exploring why gf->visited_hash_table's size is zero at this point in the execution and how we can ensure it's properly initialized before reaching this critical operation. Any further suggestions would be greatly appreciated.
Okay I was able to fix the issue by just changing the lines that contain
gf = {}
in the main function to
struct ggml_cgraph * gf = NULL;
gf = ggml_new_graph_custom(ctx0, LLAMA_TRAIN_MAX_NODES, true);
I am not sure this is the best way to go about it, but it works for now, any suggestions would be appreciated. I can also try to push the fix.
@NawafAlansari That looks correct. When the baby-llama example was created, graphs had a fixed size and could be allocated in the stack, but that was changed a while ago and now graphs need to be allocated in a ggml_context
(see https://github.com/ggerganov/ggml/issues/567 for more details). A PR to fix the example would be very welcome.
@slaren I just made a PR with a fix for the example.
This issue was closed because it has been inactive for 14 days since being marked as stale.