qdurllm
qdurllm copied to clipboard
Cease support for llama.cpp-served Gemma
Reference to #2 but also to the inefficiency of the solution
Explore new local serving methods like quantization (non-dockerizabble) and llama.cpp python package