Deepak Soma Reddy comments

Results 4 comments of


                                            Deepak Soma Reddy

Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40

One each of the worker, i ran "./dllama worker --port 9998 --nthreads 8" on the Root node, "./dllama inference --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --prompt...

Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40

@b4rtaz Thanks I will share the details

Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40

@b4rtaz Please find the logs 2xNuC ((12th Gen)) with AVX2 support. --> ![Image](https://github.com/user-attachments/assets/6e56d237-a347-42ed-ae52-71fd9c6559ea) 4xNuC ((12th Gen)) with AVX2 support. --> ![Image](https://github.com/user-attachments/assets/abff3a5a-05d0-47ca-a91e-d45afa42ad86) All 4 NuC are connected via switch.

Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40

Thanks @b4rtaz. I trieed connecting two devices directly without a router and results are slightly better. It improved by 1token/sec I see only slightly better results from 5.98 token/sec (with...