JNLLM

Results 1 comments of JNLLM

Same case, while I can run 12b models easily, gemma3 12b gets its cache offloaded. And not having v cache quantized is not an option for low vram situations. If...