For Ollama add a configuration parameter for context size
Ollama defaults to a context size of 2048 token and Goose is often exceeding that window, when this happens Ollama truncates the input. This leads to suboptimal results from the LLM.
Output from Ollama when this occurs:
time=2025-02-16T13:27:35.103Z level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=3520 keep=4 new=2048
When sending the request to Ollama the context size can be specified by adding the following to the payload
"options": { "num_ctx": 4096 }
agree.And I hope to see the CTX usage of LLM like gemini studio.
Did a bit of a dive into this. Unfortunately Goose is hitting Ollama with the OpenAi API v1/chat/completions which does not expose any direct way of modifying the model's context size.
The recommendation (seen in the above link) is to create your own ollama model with a context size and point to it with the same api. This seems convoluted and not the right solution for this problem. A short term solution assuming you're running an ollama service locally would be to update your existing instance with the desired context window
There was work planned to expose num_ctx as an openai supported endpoint but it was discarded. Instead what seems to be in progress is setting the context length via an environment variable .
If the above work goes through we can maybe add a helper in goose to set the environment variable, but I'll defer to the main Goose team to decide whether that's appropriate.
ollama is just a toy for chinldren developers, 2k context size can not do anything. and people do not want change there openai sdk to ollama's/
+1 for this enhancement. When trying to configure qwen2.5:0.5B the error is:
$ goose configure
This will update your existing config file
if you prefer, you can edit it directly at /home/jantona/.config/goose/config.yaml
┌ goose-configure
│
◇ What would you like to configure?
│ Configure Providers
│
◇ Which model provider should we use?
│ Ollama
│
● OLLAMA_HOST is already configured
│
◇ Would you like to update this value?
│ No
│
◇ Enter a model from that provider:
│ qwen2.5:0.5B
│
◇ Request failed: model "qwen2.5:0.5B" not found, try pulling it first (type: api_error) (status 404)
Yet I have already pulled the model library:
$ ollama pull qwen2.5:0.5B && ollama run qwen2.5:0.5B
pulling manifest
pulling c5396e06af29... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏ 397 MB
pulling 66b9ea09bd5b... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏ 68 B
pulling eb4402837c78... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 832dd9e00a68... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling 005f95c74751... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏ 490 B
verifying sha256 digest
writing manifest
success
>>> /show info
Model
architecture qwen2
parameters 494.03M
context length 32768
embedding length 896
quantization Q4_K_M
System
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
License
Apache License
Version 2.0, January 2004
>>> Send a message (/? for help)
Adding @jantonacci (myself) for visibility.
For the context size configuration question, per @tiensi 's comment above with recent ollama versions you can now start the server with the OLLAMA_CONTEXT_LENGTH env var set to the context length you want
eg OLLAMA_CONTEXT_LENGTH=32768 ollama serve
@jantonacci what port are you running ollama on? if no port is provided goose is assuming it's on 11434.
if no port is provided goose is assuming it's on 11434.
Correct. I am using the default port and can get some models to connect.
To set the context length for any Ollama model through the OpenAI completions API you can simply use Modelfiles to create a new wrapper for the model you wish to use. This comment shows how to do it:
- open terminal
- pull your prefered model
- then, type in your terminal this (example gemma2, change with your model):
ollama show gemma2:9b --modelfile > gemma.modelfile- open that file by using nano/pico :
nano gemma.modelfile- scroll untill you find :
.{{ .Response }}<end_of_turn> PARAMETER stop <start_of_turn> PARAMETER stop <end_of_turn> LICENSE Gemma Terms of Use Last modified: February 21, 2024...
, Or in llama just scroll untill you find key word "Parameter" then add this code :
PARAMETER num_ctx 8192below "PARAMETER stop <end_of_turn>" .
- finally we create new model file based on our custom modelfile:
ollama create gemma_8192 --file gemma.modelfile- next just check out our new model with :
ollama listNAME ID SIZE MODIFIED gemma_8192:latest 30ed6114610b 5.4 GB 2 minutes ago gemma2:9b ff02c3702f32 5.4 GB 4 minutes ago
that's it just go and use that model, I actually could change n_ctx to 8192 length.
given the discussion and the age of this issue, going to close