llama.cpp
llama.cpp copied to clipboard
Bug: ggml_compute_forward_soft_max_f32: Assertion `sum > 0.0' failed.
What happened?
My model arch is based on nanogpt(gpt2) , when I run the inference, I encounter the following faults: "llama.cpp/ggml/src/ggml.c:13698: ggml_compute_forward_soft_max_f32: Assertion `sum > 0.0' failed."
I debug the code, the following is the call_stack:
I have tried the test several times, interestingly, when I set the different number of threads, the faults occurred at a different loop index(the for loop index i1 of the ggml_compute_forward_soft_max_f32(), the figure shows the result when I set the number of threads to 4), I dump the tensor input to the function ggml_compute_forward_soft_max(), and the tensor's name is ‘kq-0’, I check the data of the tensor ‘kq-0’, no outliers is found. I will upload the input tensor data as an attachment.
dump_kq-0_shp_1_25_51_64.txt
Name and Version
version: 3486 (0832de72) built with gcc (GCC) 13.2.0 for x86_64-pc-linux-gnu
What operating system are you seeing the problem on?
No response
Relevant log output
llama.cpp/ggml/src/ggml.c:13698: ggml_compute_forward_soft_max_f32: Assertion `sum > 0.0' failed.