Abdullah Malik comments

Results 23 comments of


                                            Abdullah Malik

[Draft] Tensor Parallel support to llama.cpp

This looks really interesting! Having tp support like vllm does would bring some great speed ups!

Strategy based locator selector

Any updates on this? This seems like a great way to get a few more % for both ROCm and CUDA!

Eval bug: No generation with follow up on high token responses on GPT-OSS 120B

command is `./llama-server -m /home/ultimis/LLM/Models/ggml-org/gpt-oss-120b-GGUF/gpt-oss-120b-mxfp4-00001-of-00003.gguf -c 131072 -ngl 999 -b 2048 -ub 2048 -fa --reasoning-format none --jinja --chat-template-kwargs '{"reasoning_effort":"high"}' --host 0.0.0.0 --port 8081 -lv 1` -lv 1 is spitting out...

Eval bug: No generation with follow up on high token responses on GPT-OSS 120B

Same issue, removed --reasoning-format none, command is now: ./llama-server -m /home/ultimis/LLM/Models/ggml-org/gpt-oss-120b-GGUF/gpt-oss-120b-mxfp4-00001-of-00003.gguf -c 131072 -ngl 999 -b 2048 -ub 2048 -fa --jinja --chat-template-kwargs '{"reasoning_effort":"high"}' --host 0.0.0.0 --port 8081 -lv 1 Started...

Eval bug: No generation with follow up on high token responses on GPT-OSS 120B

Exact same issue on vulkan: https://gist.github.com/AbdullahMPrograms/15e1ba6a43c26974e97f7a1b897bab2f

Eval bug: No generation with follow up on high token responses on GPT-OSS 120B

Due to the these missing kernels in hipblaslt does that mean this is not fixable? I've begun to notice more and more this stalled generation issue while using GPT-OSS

Eval bug: No generation with follow up on high token responses on GPT-OSS 120B

@ggerganov this fixes the issue for vulkan! Vulkan is still not as performant for text generation as ROCm but at least it works!

Eval bug: No generation with follow up on high token responses on GPT-OSS 120B

Recompiling with -DGGML_CUDA_FORCE_MMQ=ON however has solved the issue for me, I have not yet done any speed testing but it seems to be comparable

Eval bug: GGML_ASSERT(hparams.n_embd_head_k % ggml_blck_size(type_k) == 0) failed

that fixed it!

Uefitool output not allowing post, mmtool missing PciBus filename

Its dual xeon 4110's, I also tried MMTool 5.007 and it was the same thing. I see you were able to see the file names, can you confirm which version...