Jaap Buurman
Jaap Buurman
Gemma 3 did NOT show the issue, so SWA can be ruled out. gpt-oss-20b does show the same issue: ``` | PP | TG | B | N_KV | T_PP...
Actually, it seems my initial post is still mostly correct. It obviously doesn't impact all MOE models equally (strongly depends on quant), but running the test with Qwen3-30B-A3B Q4_K_S shows...
From my understanding, the backends use vec_mat_muls for the single batch case and mat_mat_muls for the batched case. Could it be that the mat_mat_mul pathway is being used, but since...
> That was going to be my guess too - we have an optimized path for N=[2,8] for MAT_MUL but not for MAT_MUL_ID. How difficult would it be to check...
One thing to note, is that disabling coopmat seems to make the issue worse: Qwen3-30B-A3B Q4_K_S: ``` | PP | TG | B | N_KV | T_PP s | S_PP...
I am still interested in this issue. Let me know if I can help in any way with your research @0cc4m
I see you also opened a PR for Cline to actually utilize this. Is there any chance you could do the same for Roo Code? I have been using both...
@ochafik Roo Code is a fork of cline rapidly growing in popularity. Since it's a fork, I am hopeful it's relatively "easy" to port your PR to Roo Code instead
The same is happening on Linux with version 24.1.2. Version 24.1.1 is fine, so I have fixed it by downgrading for now. It seems to happen because the window scales...
> [#16932](https://github.com/ggml-org/llama.cpp/pull/16932) sort of works, but in my testing with Open Hands it keeps stopping for some reason. I have to type "continue" constantly and it gets stuck in repetitive...