Patrick Devine
Patrick Devine
@mindspawn Can you attach the logs and the Modelfile? Also, do you have a link to the gguf binary, or did you convert it yourself?
I just tried it and everything in `0.1.38` seems to be working just fine. `OLLAMA_HOST=0.0.0.0:11434 ollama serve` and then on a separate host: ``` OLLAMA_HOST=x.x.x.x ollama run llama3 >>> hi...
What's the output of `ollama ps`?
@x66ccff can you try updating to ollama `0.1.38`?
@15731807423 what's the output of `ollama ps`? It should tell you how much of the model is on the GPU and how much is on the CPU.
@15731807423 looks like 70b is being partially offloaded, and 8b is fully running on the GPU. When you do `/set verbose` how many tokens / second are you getting? With...
It is using the GPU, but it's not particularly *efficient* at using it because the model is split across the CPU and GPU and the limitations of the computer (like...
cc @BruceMacD
Can you post the Modelfile and the logs? What was the gguf you were using?
Can you include the Modelfile as well?