goose icon indicating copy to clipboard operation
goose copied to clipboard

For Ollama add a configuration parameter for context size

Open rubixhacker opened this issue 10 months ago • 3 comments

Ollama defaults to a context size of 2048 token and Goose is often exceeding that window, when this happens Ollama truncates the input. This leads to suboptimal results from the LLM.

Output from Ollama when this occurs: time=2025-02-16T13:27:35.103Z level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=3520 keep=4 new=2048

When sending the request to Ollama the context size can be specified by adding the following to the payload "options": { "num_ctx": 4096 }

rubixhacker avatar Feb 16 '25 13:02 rubixhacker

agree.And I hope to see the CTX usage of LLM like gemini studio.

addhyh avatar Feb 20 '25 08:02 addhyh

Did a bit of a dive into this. Unfortunately Goose is hitting Ollama with the OpenAi API v1/chat/completions which does not expose any direct way of modifying the model's context size.

The recommendation (seen in the above link) is to create your own ollama model with a context size and point to it with the same api. This seems convoluted and not the right solution for this problem. A short term solution assuming you're running an ollama service locally would be to update your existing instance with the desired context window

There was work planned to expose num_ctx as an openai supported endpoint but it was discarded. Instead what seems to be in progress is setting the context length via an environment variable .

If the above work goes through we can maybe add a helper in goose to set the environment variable, but I'll defer to the main Goose team to decide whether that's appropriate.

tiensi avatar Feb 22 '25 00:02 tiensi

ollama is just a toy for chinldren developers, 2k context size can not do anything. and people do not want change there openai sdk to ollama's/

CrazyBoyM avatar Feb 25 '25 00:02 CrazyBoyM

+1 for this enhancement. When trying to configure qwen2.5:0.5B the error is:

$ goose configure

This will update your existing config file
  if you prefer, you can edit it directly at /home/jantona/.config/goose/config.yaml

┌   goose-configure 
│
◇  What would you like to configure?
│  Configure Providers 
│
◇  Which model provider should we use?
│  Ollama 
│
●  OLLAMA_HOST is already configured
│  
◇  Would you like to update this value?
│  No 
│
◇  Enter a model from that provider:
│  qwen2.5:0.5B
│
◇  Request failed: model "qwen2.5:0.5B" not found, try pulling it first (type: api_error) (status 404)

Yet I have already pulled the model library:

$ ollama pull qwen2.5:0.5B && ollama run qwen2.5:0.5B
pulling manifest 
pulling c5396e06af29... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏ 397 MB                         
pulling 66b9ea09bd5b... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏   68 B                         
pulling eb4402837c78... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB                         
pulling 832dd9e00a68... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏  11 KB                         
pulling 005f95c74751... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏  490 B                         
verifying sha256 digest 
writing manifest 
success 
>>> /show info
  Model
    architecture        qwen2      
    parameters          494.03M    
    context length      32768      
    embedding length    896        
    quantization        Q4_K_M     

  System
    You are Qwen, created by Alibaba Cloud. You are a helpful assistant.    

  License
    Apache License               
    Version 2.0, January 2004    

>>> Send a message (/? for help)

Adding @jantonacci (myself) for visibility.

jantonacci avatar Mar 24 '25 18:03 jantonacci

For the context size configuration question, per @tiensi 's comment above with recent ollama versions you can now start the server with the OLLAMA_CONTEXT_LENGTH env var set to the context length you want

eg OLLAMA_CONTEXT_LENGTH=32768 ollama serve

alicehau avatar Mar 24 '25 20:03 alicehau

@jantonacci what port are you running ollama on? if no port is provided goose is assuming it's on 11434.

alicehau avatar Mar 24 '25 20:03 alicehau

if no port is provided goose is assuming it's on 11434.

Correct. I am using the default port and can get some models to connect.

jantonacci avatar Mar 24 '25 22:03 jantonacci

To set the context length for any Ollama model through the OpenAI completions API you can simply use Modelfiles to create a new wrapper for the model you wish to use. This comment shows how to do it:

  1. open terminal
  2. pull your prefered model
  3. then, type in your terminal this (example gemma2, change with your model): ollama show gemma2:9b --modelfile > gemma.modelfile
  4. open that file by using nano/pico : nano gemma.modelfile
  5. scroll untill you find :

.{{ .Response }}<end_of_turn> PARAMETER stop <start_of_turn> PARAMETER stop <end_of_turn> LICENSE Gemma Terms of Use Last modified: February 21, 2024...

, Or in llama just scroll untill you find key word "Parameter" then add this code : PARAMETER num_ctx 8192 below "PARAMETER stop <end_of_turn>" .

  1. finally we create new model file based on our custom modelfile: ollama create gemma_8192 --file gemma.modelfile
  2. next just check out our new model with : ollama list

NAME ID SIZE MODIFIED gemma_8192:latest 30ed6114610b 5.4 GB 2 minutes ago gemma2:9b ff02c3702f32 5.4 GB 4 minutes ago

that's it just go and use that model, I actually could change n_ctx to 8192 length.

LowYieldFire avatar Apr 29 '25 12:04 LowYieldFire

given the discussion and the age of this issue, going to close

DOsinga avatar Jul 01 '25 20:07 DOsinga