Mark Ward
Mark Ward
@dhiltgen , I have tested against main and I get the same issue where the model will sometimes load on CPU instead of GPU. My log may have some of...
@dhiltgen , I have checked out the `tags/v0.1.31` built, and my initial testing has not been able to reproduce the bug. I will continue to run the test on the...
What if I tried to isolate changes in Ollama or llama.cpp by testing version v0.1.31 with llama.cpp used in the newer release and main? What is an approach to do...
I have a build of Ollama v0.1.31 running with the llama.cpp commit from v0.1.32 and testing it right now.
More details about the program I am using. The program benchmarks a fixed list of prompts against the list of models loaded in Ollama. It sorts the list of models...
I can run it under a debugger with breakpoints. I will attempt to observe the execution under my testing.
@dhiltgen, I can reproduce the issue with `ollama run` on `main` branch I have a breakpoint on `memory.go` [slog.Debug("insufficient VRAM to load any model layers")](https://github.com/ollama/ollama/blob/3450a57d4a4a2c74892318b7f3986bab6461b2fc/llm/memory.go#L99) Start a new ollama instance,...
@dhiltgen , I see there was a change in the `gpu.go` from v0.1.31 to v0.1.32 and I have been watching how the results of the `GetGPUInfo()` The OS is Ubuntu...
It appears Ollama is getting the wrong available memory each time. Maybe the model has not fully unloaded before the get GPU call is made? I tested a situation where...
If I have breakpoints in the code in the areas where the runner is unloaded and right at the beginning of the `initGPUHandles()` I can't reproduce the issue. I think...