Mark Ward

Results 39 comments of Mark Ward

@dhiltgen , I have tested against main and I get the same issue where the model will sometimes load on CPU instead of GPU. My log may have some of...

@dhiltgen , I have checked out the `tags/v0.1.31` built, and my initial testing has not been able to reproduce the bug. I will continue to run the test on the...

What if I tried to isolate changes in Ollama or llama.cpp by testing version v0.1.31 with llama.cpp used in the newer release and main? What is an approach to do...

I have a build of Ollama v0.1.31 running with the llama.cpp commit from v0.1.32 and testing it right now.

More details about the program I am using. The program benchmarks a fixed list of prompts against the list of models loaded in Ollama. It sorts the list of models...

I can run it under a debugger with breakpoints. I will attempt to observe the execution under my testing.

@dhiltgen, I can reproduce the issue with `ollama run` on `main` branch I have a breakpoint on `memory.go` [slog.Debug("insufficient VRAM to load any model layers")](https://github.com/ollama/ollama/blob/3450a57d4a4a2c74892318b7f3986bab6461b2fc/llm/memory.go#L99) Start a new ollama instance,...

@dhiltgen , I see there was a change in the `gpu.go` from v0.1.31 to v0.1.32 and I have been watching how the results of the `GetGPUInfo()` The OS is Ubuntu...

It appears Ollama is getting the wrong available memory each time. Maybe the model has not fully unloaded before the get GPU call is made? I tested a situation where...

If I have breakpoints in the code in the areas where the runner is unloaded and right at the beginning of the `initGPUHandles()` I can't reproduce the issue. I think...