JNLLM comments

Repositories
Issues
Comments

Results 1 comments of


                                            JNLLM

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache.

Same case, while I can run 12b models easily, gemma3 12b gets its cache offloaded. And not having v cache quantized is not an option for low vram situations. If...