JNLLM
Results
1
comments of
JNLLM
Same case, while I can run 12b models easily, gemma3 12b gets its cache offloaded. And not having v cache quantized is not an option for low vram situations. If...