distributed-llama icon indicating copy to clipboard operation
distributed-llama copied to clipboard

segmentation fault

Open zhangddjs opened this issue 7 months ago β€’ 4 comments

While I try to start deepseek_r1_distill_llama_8b_q40 model on my Raspberry Pi 4B 8G machine, It failed with segmentation fault as follows. The rpi can successfully deploy a smaller llama 1b model.

sudo nice -n -20 ./dllama chat --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 3 --max-seq-len 256

[sudo] password for zhangddjs: πŸ“„ BosId: 128000 (<|begin▁of▁sentence|>) πŸ“„ EosId: 128001 (<|end▁of▁sentence|>) 128001 (<|end▁of▁sentence|>) πŸ“„ RegularVocabSize: 128000 πŸ“„ SpecialVocabSize: 256 πŸ’‘ Arch: Llama πŸ’‘ HiddenAct: Silu πŸ’‘ Dim: 4096 πŸ’‘ KvDim: 1024 πŸ’‘ HiddenDim: 14336 πŸ’‘ VocabSize: 128256 πŸ’‘ nLayers: 32 πŸ’‘ nHeads: 32 πŸ’‘ nKvHeads: 8 πŸ’‘ OrigSeqLen: 131072 πŸ’‘ SeqLen: 256 πŸ’‘ NormEpsilon: 0.000010 πŸ’‘ RopeType: Llama3.1 πŸ’‘ RopeTheta: 500000 πŸ’‘ RopeScaling: f=8.0, l=1.0, h=4.0, o=8192 πŸ“€ RequiredMemory: 6285070 kB 🧠 CPU: neon fp16 πŸ’Ώ Loading weights... [1] 36355 segmentation fault sudo nice -n -20 ./dllama chat --model --tokenizer --buffer-float-type q80 FAIL

zhangddjs avatar May 16 '25 06:05 zhangddjs

And I haved tried on the Mac Book M4 Pro, also failed with same problem

zhangddjs avatar May 16 '25 07:05 zhangddjs

the launch command and output:

python launch.py deepseek_r1_distill_llama_8b_q40         [5:21:18]
πŸ“€ Downloading deepseek_r1_distill_llama_8b_q40 to models/deepseek_r1_distill_llama_8b_q40...
πŸ“„ https://huggingface.co/b4rtaz/DeepSeek-R1-Distill-Llama-8B-Distributed-Llama/resolve/main/dllama_model_deepseek-r1-distill-llama-8b_q40.m?download=true (attempt: 0)
Downloaded 5545 MB
 βœ…
πŸ“„ https://huggingface.co/b4rtaz/DeepSeek-R1-Distill-Llama-8B-Distributed-Llama/resolve/main/dllama_tokenizer_deepseek-r1-distill-llama-8b.t?download=true (attempt: 0)
Downloaded 1 MB
 βœ…
πŸ“€ All files are downloaded
To run Distributed Llama you need to execute:
--- copy start ---

./dllama chat --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 4 --max-seq-len 4096

--- copy end -----
🌻 Created run_deepseek_r1_distill_llama_8b_q40.sh script to easy run
❓ Do you want to run Distributed Llama? ("Y" if yes): n


zhangddjs avatar May 16 '25 07:05 zhangddjs

Hi, do you have enough free RAM on your systems? Dllama doesn't seem to check if the model will fit into RAM.

D-i-t-gh avatar May 17 '25 06:05 D-i-t-gh

Hi @D-i-t-gh , I have enough free Ram, 10GB, while here require 6GB Ram: πŸ“€ RequiredMemory: 6285070 kB

zhangddjs avatar May 18 '25 04:05 zhangddjs