Jaap Buurman comments

Results 101 comments of


                                            Jaap Buurman

Misc. bug: Vulkan backend shows negative scaling at low batch sizes with MOE models

Gemma 3 did NOT show the issue, so SWA can be ruled out. gpt-oss-20b does show the same issue: ``` | PP | TG | B | N_KV | T_PP...

Misc. bug: Vulkan backend shows negative scaling at low batch sizes with MOE models

Actually, it seems my initial post is still mostly correct. It obviously doesn't impact all MOE models equally (strongly depends on quant), but running the test with Qwen3-30B-A3B Q4_K_S shows...

Misc. bug: Vulkan backend shows negative scaling at low batch sizes with MOE models

From my understanding, the backends use vec_mat_muls for the single batch case and mat_mat_muls for the batched case. Could it be that the mat_mat_mul pathway is being used, but since...

Misc. bug: Vulkan backend shows negative scaling at low batch sizes with MOE models

> That was going to be my guess too - we have an optimized path for N=[2,8] for MAT_MUL but not for MAT_MUL_ID. How difficult would it be to check...

Misc. bug: Vulkan backend shows negative scaling at low batch sizes with MOE models

One thing to note, is that disabling coopmat seems to make the issue worse: Qwen3-30B-A3B Q4_K_S: ``` | PP | TG | B | N_KV | T_PP s | S_PP...

Misc. bug: Vulkan backend shows negative scaling at low batch sizes with MOE models

I am still interested in this issue. Let me know if I can help in any way with your research @0cc4m

`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars

I see you also opened a PR for Cline to actually utilize this. Is there any chance you could do the same for Roo Code? I have been using both...

`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars

@ochafik Roo Code is a fork of cline rapidly growing in popularity. Since it's a fork, I am hopeful it's relatively "easy" to port your PR to Roo Code instead

Dump window expands out of screen when the dump is run more than one time

The same is happening on Linux with version 24.1.2. Version 24.1.1 is fine, so I have fixed it by downgrading for now. It seems to happen because the window scales...

Feature Request: Kimi-K2-Thinking reasoning and tool calling support

> [#16932](https://github.com/ggml-org/llama.cpp/pull/16932) sort of works, but in my testing with Open Hands it keeps stopping for some reason. I have to type "continue" constantly and it gets stuck in repetitive...