Mark Ward comments

Results 39 comments of


                                            Mark Ward

v0.1.32 is running GPU capable models on CPU

@dhiltgen , I have tested against main and I get the same issue where the model will sometimes load on CPU instead of GPU. My log may have some of...

v0.1.32 is running GPU capable models on CPU

@dhiltgen , I have checked out the `tags/v0.1.31` built, and my initial testing has not been able to reproduce the bug. I will continue to run the test on the...

v0.1.32 is running GPU capable models on CPU

What if I tried to isolate changes in Ollama or llama.cpp by testing version v0.1.31 with llama.cpp used in the newer release and main? What is an approach to do...

v0.1.32 is running GPU capable models on CPU

I have a build of Ollama v0.1.31 running with the llama.cpp commit from v0.1.32 and testing it right now.

v0.1.32 is running GPU capable models on CPU

More details about the program I am using. The program benchmarks a fixed list of prompts against the list of models loaded in Ollama. It sorts the list of models...

v0.1.32 is running GPU capable models on CPU

I can run it under a debugger with breakpoints. I will attempt to observe the execution under my testing.

v0.1.32 is running GPU capable models on CPU

@dhiltgen, I can reproduce the issue with `ollama run` on `main` branch I have a breakpoint on `memory.go` [slog.Debug("insufficient VRAM to load any model layers")](https://github.com/ollama/ollama/blob/3450a57d4a4a2c74892318b7f3986bab6461b2fc/llm/memory.go#L99) Start a new ollama instance,...