Jaap Buurman

Results 101 comments of Jaap Buurman

Gemma 3 did NOT show the issue, so SWA can be ruled out. gpt-oss-20b does show the same issue: ``` | PP | TG | B | N_KV | T_PP...

Actually, it seems my initial post is still mostly correct. It obviously doesn't impact all MOE models equally (strongly depends on quant), but running the test with Qwen3-30B-A3B Q4_K_S shows...

From my understanding, the backends use vec_mat_muls for the single batch case and mat_mat_muls for the batched case. Could it be that the mat_mat_mul pathway is being used, but since...

> That was going to be my guess too - we have an optimized path for N=[2,8] for MAT_MUL but not for MAT_MUL_ID. How difficult would it be to check...

One thing to note, is that disabling coopmat seems to make the issue worse: Qwen3-30B-A3B Q4_K_S: ``` | PP | TG | B | N_KV | T_PP s | S_PP...

I am still interested in this issue. Let me know if I can help in any way with your research @0cc4m

I see you also opened a PR for Cline to actually utilize this. Is there any chance you could do the same for Roo Code? I have been using both...

@ochafik Roo Code is a fork of cline rapidly growing in popularity. Since it's a fork, I am hopeful it's relatively "easy" to port your PR to Roo Code instead

The same is happening on Linux with version 24.1.2. Version 24.1.1 is fine, so I have fixed it by downgrading for now. It seems to happen because the window scales...

> [#16932](https://github.com/ggml-org/llama.cpp/pull/16932) sort of works, but in my testing with Open Hands it keeps stopping for some reason. I have to type "continue" constantly and it gets stuck in repetitive...