Eve
Eve
> Best way to improve the speed is to use as small model as possible. You can try @karpathy's tinyllamas: https://huggingface.co/karpathy/tinyllamas > > Here are instructions for converting to GGUF...
> Well, I have been wondering for a while why nobody is training quantized models directly. Given how close we can come to the performance of the `fp16` model with...
llama-cpp-python worked fine with Vulkan last night (on Linux) when I built it with my PR https://github.com/ggerganov/llama.cpp/pull/5182.