Patrick Devine

Results 323 comments of Patrick Devine

@mindspawn Can you attach the logs and the Modelfile? Also, do you have a link to the gguf binary, or did you convert it yourself?

I just tried it and everything in `0.1.38` seems to be working just fine. `OLLAMA_HOST=0.0.0.0:11434 ollama serve` and then on a separate host: ``` OLLAMA_HOST=x.x.x.x ollama run llama3 >>> hi...

@x66ccff can you try updating to ollama `0.1.38`?

@15731807423 what's the output of `ollama ps`? It should tell you how much of the model is on the GPU and how much is on the CPU.

@15731807423 looks like 70b is being partially offloaded, and 8b is fully running on the GPU. When you do `/set verbose` how many tokens / second are you getting? With...

It is using the GPU, but it's not particularly *efficient* at using it because the model is split across the CPU and GPU and the limitations of the computer (like...

Can you post the Modelfile and the logs? What was the gguf you were using?

Can you include the Modelfile as well?