segmentation fault
While I try to start deepseek_r1_distill_llama_8b_q40 model on my Raspberry Pi 4B 8G machine, It failed with segmentation fault as follows. The rpi can successfully deploy a smaller llama 1b model.
sudo nice -n -20 ./dllama chat --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 3 --max-seq-len 256
[sudo] password for zhangddjs: π BosId: 128000 (<ο½beginβofβsentenceο½>) π EosId: 128001 (<ο½endβofβsentenceο½>) 128001 (<ο½endβofβsentenceο½>) π RegularVocabSize: 128000 π SpecialVocabSize: 256 π‘ Arch: Llama π‘ HiddenAct: Silu π‘ Dim: 4096 π‘ KvDim: 1024 π‘ HiddenDim: 14336 π‘ VocabSize: 128256 π‘ nLayers: 32 π‘ nHeads: 32 π‘ nKvHeads: 8 π‘ OrigSeqLen: 131072 π‘ SeqLen: 256 π‘ NormEpsilon: 0.000010 π‘ RopeType: Llama3.1 π‘ RopeTheta: 500000 π‘ RopeScaling: f=8.0, l=1.0, h=4.0, o=8192 π RequiredMemory: 6285070 kB π§ CPU: neon fp16 πΏ Loading weights... [1] 36355 segmentation fault sudo nice -n -20 ./dllama chat --model --tokenizer --buffer-float-type q80 FAIL
And I haved tried on the Mac Book M4 Pro, also failed with same problem
the launch command and output:
python launch.py deepseek_r1_distill_llama_8b_q40 [5:21:18]
π Downloading deepseek_r1_distill_llama_8b_q40 to models/deepseek_r1_distill_llama_8b_q40...
π https://huggingface.co/b4rtaz/DeepSeek-R1-Distill-Llama-8B-Distributed-Llama/resolve/main/dllama_model_deepseek-r1-distill-llama-8b_q40.m?download=true (attempt: 0)
Downloaded 5545 MB
β
π https://huggingface.co/b4rtaz/DeepSeek-R1-Distill-Llama-8B-Distributed-Llama/resolve/main/dllama_tokenizer_deepseek-r1-distill-llama-8b.t?download=true (attempt: 0)
Downloaded 1 MB
β
π All files are downloaded
To run Distributed Llama you need to execute:
--- copy start ---
./dllama chat --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 4 --max-seq-len 4096
--- copy end -----
π» Created run_deepseek_r1_distill_llama_8b_q40.sh script to easy run
β Do you want to run Distributed Llama? ("Y" if yes): n
Hi, do you have enough free RAM on your systems? Dllama doesn't seem to check if the model will fit into RAM.
Hi @D-i-t-gh , I have enough free Ram, 10GB, while here require 6GB Ram: π RequiredMemory: 6285070 kB