frob
frob
[Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will help in debugging.
Model qwen2:72b-instruct-q4_0 is loaded at Aug 11 09:20:23 and then again at 09:24:44 with no indication of a model unload inbetween. OLLAMA_KEEP_ALIVE=1h0m0s. Can you add `OLLAMA_DEBUG=1` to your server environment...
``` Aug 14 18:24:04 neuron0 ollama[35955]: time=2024-08-14T18:24:04.434Z level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=81 layers.offload=16 layers.split=8,8 memory.available="[7.8 GiB 7.8 GiB]" memory.required.full="49.8 GiB" memory.required.partial="15.4 GiB" memory.required.kv="5.0 GiB" memory.required.allocations="[7.7 GiB 7.7 GiB]"...
Contents of modelfile? Which model are you trying to import?
MiniCPM-V-2_6 is not supported yet. When https://github.com/ggerganov/llama.cpp/pull/7599 is merged then it will work.
What commands are you running to create the model?
Where are you getting the GGUF files from, or how are you creating them?
The ggml-model-Q4_K_M.gguf that I downloaded has 8 extra null bytes on the end, these confuse ollama. If your copy of the file is 4681089344 bytes long, this is the problem....
```console $ xxd -R never -s -32 ggml-model-Q4_K_M.gguf 11703c120: aac7 5652 a521 b9a5 65b4 65b8 aca0 5554 ..VR.!..e.e...UT 11703c130: 80ad a564 9499 8900 0000 0000 0000 0000 ...d............ ```
I have a modified ollama that I used to detect this problem. ```diff diff --git a/llm/ggml.go b/llm/ggml.go index e857d4b8..0f92965a 100644 --- a/llm/ggml.go +++ b/llm/ggml.go @@ -324,6 +324,7 @@ func DecodeGGML(rs...