Deepak Soma Reddy
Deepak Soma Reddy
One each of the worker, i ran "./dllama worker --port 9998 --nthreads 8" on the Root node, "./dllama inference --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --prompt...
@b4rtaz Thanks I will share the details
@b4rtaz Please find the logs 2xNuC ((12th Gen)) with AVX2 support. -->  4xNuC ((12th Gen)) with AVX2 support. -->  All 4 NuC are connected via switch.
Thanks @b4rtaz. I trieed connecting two devices directly without a router and results are slightly better. It improved by 1token/sec I see only slightly better results from 5.98 token/sec (with...