llama.cpp Bug: ggml_compute_forward_soft_max_f32: Assertion `sum

Bug: ggml_compute_forward_soft_max_f32: Assertion `sum > 0.0' failed.

Open Francis235 opened this issue 5 months ago • 0 comments

What happened?

My model arch is based on nanogpt(gpt2) , when I run the inference, I encounter the following faults: "llama.cpp/ggml/src/ggml.c:13698: ggml_compute_forward_soft_max_f32: Assertion `sum > 0.0' failed." I debug the code, the following is the call_stack: I have tried the test several times, interestingly, when I set the different number of threads, the faults occurred at a different loop index(the for loop index i1 of the ggml_compute_forward_soft_max_f32(), the figure shows the result when I set the number of threads to 4), I dump the tensor input to the function ggml_compute_forward_soft_max(), and the tensor's name is ‘kq-0’, I check the data of the tensor ‘kq-0’, no outliers is found. I will upload the input tensor data as an attachment. dump_kq-0_shp_1_25_51_64.txt

Name and Version

version: 3486 (0832de72) built with gcc (GCC) 13.2.0 for x86_64-pc-linux-gnu

What operating system are you seeing the problem on?

No response

Relevant log output

llama.cpp/ggml/src/ggml.c:13698: ggml_compute_forward_soft_max_f32: Assertion `sum > 0.0' failed.

Aug 28 '24 12:08 Francis235

llama.cpp llama.cpp copied to clipboard

Bug: ggml_compute_forward_soft_max_f32: Assertion `sum > 0.0' failed.

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

llama.cpp
llama.cpp copied to clipboard