distributed-llama Output all are "!"

As I shows in the picture, all outputs of inference is "!".

I tried different approaches and found that I could solve this problem only if I turned on O0 optimization. But it runs too slowly with O0 optimizations turned on, and this can happen with O1 O2 O3 optimizations turned on.

Has anyone ever had the same problem? How should we solve it.

Sep 04 '24 07:09 HysenX-LI

This is very weird. What CPU/OS?

Sep 04 '24 08:09 b4rtaz

Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Ubuntu 18.04

Sep 04 '24 08:09 HysenX-LI

Even with O0, it may output some meaningless output

Sep 04 '24 08:09 HysenX-LI

Which model?

Sep 04 '24 08:09 b4rtaz

dllama_model_llama3_8b_q40.m

Sep 04 '24 08:09 HysenX-LI

Could you check again on version 0.12.1? Many memory leaks have been fixed. Please re-download the model and tokenizer.

Feb 14 '25 15:02 b4rtaz