frob
frob
``` time=2024-12-05T01:17:11.613-08:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2]" ``` The version you've built doesn't have any runners that use the GPU, only CPU runners.
Which build guide? Fedora 41 appears to be [not supported yet](https://github.com/ollama/ollama/issues/7869) so building from source may not work yet. If you have docker installed, you could try the docker image.
Most releases that aren't bleeding edge should work. Fedora 41 was released October 29, 2024 so it will take a little work to make sure all the right dependencies are...
Ollama has probably done the tensor split sub-optimally. [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging. What are the parameters you use when you run llama.cpp directly?
What's the token generation rate for both configurations?
Logs from earlier in the run will show ollama calculations.
Do you have the logs for this allocation? The logs you posted earlier were from two different runs and it's difficult to piece together the flow.
More interesting is what ollama thought the state of the GPU was before it tried to allocate layers. The distribution algorithm tries to equalize across GPUs, but doesn't account for...
If the model is loading and unloading from VRAM it will be recorded in the logs. But this behaviour isn't normal, a screen recording may shed light.
Can you add the logs for this period?