distributed-llama segmentation fault

While I try to start deepseek_r1_distill_llama_8b_q40 model on my Raspberry Pi 4B 8G machine, It failed with segmentation fault as follows. The rpi can successfully deploy a smaller llama 1b model.

sudo nice -n -20 ./dllama chat --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 3 --max-seq-len 256

[sudo] password for zhangddjs: 📄 BosId: 128000 (<｜begin▁of▁sentence｜>) 📄 EosId: 128001 (<｜end▁of▁sentence｜>) 128001 (<｜end▁of▁sentence｜>) 📄 RegularVocabSize: 128000 📄 SpecialVocabSize: 256 💡 Arch: Llama 💡 HiddenAct: Silu 💡 Dim: 4096 💡 KvDim: 1024 💡 HiddenDim: 14336 💡 VocabSize: 128256 💡 nLayers: 32 💡 nHeads: 32 💡 nKvHeads: 8 💡 OrigSeqLen: 131072 💡 SeqLen: 256 💡 NormEpsilon: 0.000010 💡 RopeType: Llama3.1 💡 RopeTheta: 500000 💡 RopeScaling: f=8.0, l=1.0, h=4.0, o=8192 📀 RequiredMemory: 6285070 kB 🧠 CPU: neon fp16 💿 Loading weights... [1] 36355 segmentation fault sudo nice -n -20 ./dllama chat --model --tokenizer --buffer-float-type q80 FAIL

May 16 '25 06:05 zhangddjs

And I haved tried on the Mac Book M4 Pro, also failed with same problem

May 16 '25 07:05 zhangddjs

the launch command and output:

python launch.py deepseek_r1_distill_llama_8b_q40         [5:21:18]
📀 Downloading deepseek_r1_distill_llama_8b_q40 to models/deepseek_r1_distill_llama_8b_q40...
📄 https://huggingface.co/b4rtaz/DeepSeek-R1-Distill-Llama-8B-Distributed-Llama/resolve/main/dllama_model_deepseek-r1-distill-llama-8b_q40.m?download=true (attempt: 0)
Downloaded 5545 MB
 ✅
📄 https://huggingface.co/b4rtaz/DeepSeek-R1-Distill-Llama-8B-Distributed-Llama/resolve/main/dllama_tokenizer_deepseek-r1-distill-llama-8b.t?download=true (attempt: 0)
Downloaded 1 MB
 ✅
📀 All files are downloaded
To run Distributed Llama you need to execute:
--- copy start ---

./dllama chat --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 4 --max-seq-len 4096

--- copy end -----
🌻 Created run_deepseek_r1_distill_llama_8b_q40.sh script to easy run
❓ Do you want to run Distributed Llama? ("Y" if yes): n

May 16 '25 07:05 zhangddjs

Hi, do you have enough free RAM on your systems? Dllama doesn't seem to check if the model will fit into RAM.

May 17 '25 06:05 D-i-t-gh

Hi @D-i-t-gh , I have enough free Ram, 10GB, while here require 6GB Ram: 📀 RequiredMemory: 6285070 kB

May 18 '25 04:05 zhangddjs