frob
frob
`n_ctx` in your chat completion example should be `num_ctx`.
What prompt?
What's the goal? Can you provide a complete python script that demonstrates the problem?
I ran the script with the suggested prompt, final output below. The last line (`By incorporating these enhancements and suggestions, this revised article...`) seems to be an appropriate conclusion implying...
I re-ran it with num_ctx=2048, output still seems reasonable: ``` Improved text (Iteration 5): **Welcome Home: A Comprehensive Guide to Living in Las Vegas** As you sip your morning coffee,...
The first run (https://github.com/ollama/ollama/issues/6286#issuecomment-2307292747) was the script as you supplied, `num_ctx=128000`. The other iterations were similar in content, some shorter, but looked consistent and complete.
Have you tried changing [`num_predict`](https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values:~:text=tfs_z%201-,num_predict,-Maximum%20number%20of)? It's the ollama alias for [--predict](https://github.com/ggerganov/llama.cpp/tree/master/examples/main#number-of-tokens-to-predict), set it in the options the same way as `num_ctx`. I don't see how it would be different for...
So, the problem is only with German text, yet the examples you gave are for English, which seems to be fine?
mmap doesn't affect the check for memory. If your system doesn't have enough system memory to load the model, you need to increase it by [adding swap](https://github.com/ollama/ollama/issues/6918#issuecomment-2488651203).
OLLAMA_LLM_LIBRARY is not being ignored, the chosen runner is a CPU based one: ``` time=2024-08-09T09:58:06.924-07:00 level=INFO source=server.go:172 msg="user override" OLLAMA_LLM_LIBRARY=cpu path=/tmp/ollama1197517005/runners/cpu time=2024-08-09T09:58:06.924-07:00 level=INFO source=server.go:390 msg="starting llama server" cmd="/tmp/ollama1197517005/runners/cpu/ollama_llama_server --model /home/yuri/.ollama/models/blobs/sha256-ef311de6af9db043d51ca4b1e766c28e0a1ac41d60420fed5e001dc470c64b77...