Output all are "!"
As I shows in the picture, all outputs of inference is "!".
I tried different approaches and found that I could solve this problem only if I turned on O0 optimization. But it runs too slowly with O0 optimizations turned on, and this can happen with O1 O2 O3 optimizations turned on.
Has anyone ever had the same problem? How should we solve it.
This is very weird. What CPU/OS?
Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Ubuntu 18.04
Even with O0, it may output some meaningless output
Which model?
dllama_model_llama3_8b_q40.m
Could you check again on version 0.12.1? Many memory leaks have been fixed. Please re-download the model and tokenizer.