ollama Bug Report: Context Loss

What is the issue?

Bug Report: Context Loss in `mistral-nemo` Model

Description

I am using the mistral-nemo model in Ollama, which is supposed to have a 128k context window. However, I am observing issues with context loss with this model and the llama3 model as well.

Steps to Reproduce

Run the command ollama run mistral-nemo.
Use the following prompt:

'{id: 0, name: "Alexander City"}'
'{id: 1, name: "Andalusia"}'
'{id: 2, name: "Anniston"}'
'{id: 3, name: "Athens"}'
'{id: 4, name: "Atmore"}'
'{id: 5, name: "Auburn"}'
'{id: 6, name: "Bessemer"}'
'{id: 7, name: "Birmingham"}'
'{id: 8, name: "Chickasaw"}'
'{id: 9, name: "Clanton"}'
'{id: 10, name: "Cullman"}'
'{id: 11, name: "Decatur"}'
'{id: 12, name: "Demopolis"}'
'{id: 13, name: "Dothan"}'
'{id: 14, name: "Enterprise"}'
'{id: 15, name: "Eufaula"}'
'{id: 16, name: "Florence"}'
'{id: 17, name: "Fort Payne"}'
'{id: 18, name: "Gadsden"}'
'{id: 19, name: "Greenville"}'
'{id: 20, name: "Guntersville"}'
'{id: 21, name: "Huntsville"}'
'{id: 22, name: "Jasper"}'
'{id: 23, name: "Marion"}'
'{id: 24, name: "Mobile"}'
'{id: 25, name: "Montgomery"}'
'{id: 26, name: "Opelika"}'
'{id: 27, name: "Ozark"}'
'{id: 28, name: "Phenix City"}'
'{id: 29, name: "Prichard"}'
'{id: 30, name: "Scottsboro"}'
'{id: 31, name: "Selma"}'
'{id: 32, name: "Sheffield"}'
'{id: 33, name: "Sylacauga"}'
'{id: 34, name: "Talladega"}'
'{id: 35, name: "Troy"}'
'{id: 36, name: "Tuscaloosa"}'
'{id: 37, name: "Tuscumbia"}'
'{id: 38, name: "Tuskegee"}'
'{id: 39, name: "Alaska"}'
'{id: 40, name: "Anchorage"}'
'{id: 41, name: "Cordova"}'
'{id: 42, name: "Fairbanks"}'
'{id: 43, name: "Haines"}'
'{id: 44, name: "Homer"}'
'{id: 45, name: "Juneau"}'
'{id: 46, name: "Ketchikan"}'
'{id: 47, name: "Kodiak"}'
'{id: 48, name: "Kotzebue"}'
'{id: 49, name: "Nome"}'
'{id: 50, name: "Palmer"}'
'{id: 51, name: "Seward"}'
'{id: 52, name: "Sitka"}'
'{id: 53, name: "Skagway"}'
'{id: 54, name: "Valdez"}'
'{id: 55, name: "Arizona"}'
'{id: 56, name: "Ajo"}'
'{id: 57, name: "Avondale"}'
'{id: 58, name: "Bisbee"}'
'{id: 59, name: "Casa Grande"}'
'{id: 60, name: "Chandler"}'
'{id: 61, name: "Clifton"}'
'{id: 62, name: "Douglas"}'
'{id: 63, name: "Flagstaff"}'
'{id: 64, name: "Florence"}'
'{id: 65, name: "Gila Bend"}'
'{id: 66, name: "Glendale"}'
'{id: 67, name: "Globe"}'
'{id: 68, name: "Kingman"}'
'{id: 69, name: "Lake Havasu City"}'
'{id: 70, name: "Mesa"}'
'{id: 71, name: "Nogales"}'
'{id: 72, name: "Oraibi"}'
'{id: 73, name: "Phoenix"}'
'{id: 74, name: "Prescott"}'
'{id: 75, name: "Scottsdale"}'
'{id: 76, name: "Sierra Vista"}'
'{id: 77, name: "Tempe"}'
'{id: 78, name: "Tombstone"}'
'{id: 79, name: "Tucson"}'
'{id: 80, name: "Walpi"}'
'{id: 81, name: "Window Rock"}'
'{id: 82, name: "Winslow"}'
'{id: 83, name: "Yuma"}'
'{id: 84, name: "Arkansas"}'
'{id: 85, name: "Arkadelphia"}'
'{id: 86, name: "Arkansas Post"}'
'{id: 87, name: "Batesville"}'
'{id: 88, name: "Benton"}'
'{id: 89, name: "Blytheville"}'
'{id: 90, name: "Camden"}'
'{id: 91, name: "Conway"}'
'{id: 92, name: "Crossett"}'
'{id: 93, name: "El Dorado"}'
'{id: 94, name: "Fayetteville"}'
'{id: 95, name: "Forrest City"}'
'{id: 96, name: "Fort Smith"}'
'{id: 97, name: "Harrison"}'
'{id: 98, name: "Helena"}'
'{id: 99, name: "Hope"}'
'{id: 100, name: "Hot Springs"}'
'{id: 101, name: "Jacksonville"}'
'{id: 102, name: "Jonesboro"}'
'{id: 103, name: "Little Rock"}'
'{id: 104, name: "Magnolia"}'
'{id: 105, name: "Morrilton"}'
'{id: 106, name: "Newport"}'
'{id: 107, name: "North Little Rock"}'
'{id: 108, name: "Osceola"}'
'{id: 109, name: "Pine Bluff"}'
'{id: 110, name: "Rogers"}'
'{id: 111, name: "Searcy"}'
'{id: 112, name: "Stuttgart"}'
'{id: 113, name: "Van Buren"}'
'{id: 114, name: "West Memphis"}'
'{id: 115, name: "California"}'
'{id: 116, name: "Alameda"}'
'{id: 117, name: "Alhambra"}'
'{id: 118, name: "Anaheim"}'
'{id: 119, name: "Antioch"}'
'{id: 120, name: "Arcadia"}'
'{id: 121, name: "Bakersfield"}'
'{id: 122, name: "Barstow"}'
'{id: 123, name: "Belmont"}'
'{id: 124, name: "Berkeley"}'
'{id: 125, name: "Beverly Hills"}'
'{id: 126, name: "Brea"}'
'{id: 127, name: "Buena Park"}'
'{id: 128, name: "Burbank"}'
'{id: 129, name: "Calexico"}'
'{id: 130, name: "Calistoga"}'
'{id: 131, name: "Carlsbad"}'
'{id: 132, name: "Carmel"}'
'{id: 133, name: "Chico"}'
'{id: 134, name: "Chula Vista"}'
'{id: 135, name: "Claremont"}'
'{id: 136, name: "Compton"}'
'{id: 137, name: "Concord"}'
'{id: 138, name: "Corona"}'
'{id: 139, name: "Coronado"}'
'{id: 140, name: "Costa Mesa"}'
'{id: 141, name: "Culver City"}'
'{id: 142, name: "Daly City"}'
'{id: 143, name: "Davis"}'
'{id: 144, name: "Downey"}'
'{id: 145, name: "El Centro"}'
'{id: 146, name: "El Cerrito"}'
'{id: 147, name: "El Monte"}'
'{id: 148, name: "Escondido"}'
'{id: 149, name: "Eureka"}'
'{id: 150, name: "Fairfield"}

Ask: "What's the city with id 1?"

The response is:

The city with id 1 is not included in the provided list. The smallest id number present is for Arkadelphia, which has an id of 85.

This indicates that the model is losing context since Andalusia should be the correct answer.

Temporary Solution

I found that setting the num_ctx parameter to 120000 before the prompt resolves the issue. For example:

/set parameter num_ctx 120000

Then, when repeating the experiment, I receive the correct response:

The city with id 1 is "Andalusia".

Additional Information

Running ollama show mistral-nemo provides the following details:

Model
  arch                    llama
  parameters              12.2B
  quantization            Q4_0
  context length          1.024e+06
  embedding length        5120

Parameters
  stop    "[INST]"
  stop    "[/INST]"

License
  "Apache License"

API Usage

When using the API directly with curl, the problem persists even if I use:

options = {"num_ctx": 120000}

To resolve this, I need to pass:

{"role": "user", "content": "/set parameter num_ctx 120000"}

within messages during the request.

Conclusion

There is a bug in Ollama where the mistral-nemo (and llama3) model loses context unless the num_ctx parameter is explicitly set within the prompts.

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.2.8

Jul 26 '24 01:07 rodolfo-nobrega

As background, ollama API calls (and CLI prompts via ollama run) that don't specify a context length, use a context length of 2048 tokens. If you need more context, you either need to specify that in the API call or change the default. options = {"num_ctx": 120000} is the canonical way to do that, if it's not working that's a bug and it would be great if you could send server logs demontrating this issue. Adding {"role": "user", "content": "/set parameter num_ctx 120000"} to an API call will not change the context window because /set parameter is interpreted by the CLI (ollama run), which is not involved in inference during an API call.

Note that the OpenAI compatibility endpoints (localhost:11434/v1/) follow the OpenAPI API standard in that they don't allow setting the size of the context window. If you need a larger context window and want to use the OpenAI endpoints, you need to set PARAMETER num_ctx as detailed below.

To change the default context window of a model and so do away with having to add options to an API call, you need to create a new model. This is easy to do and if you follow the instructions, does not require more disk.

First, get a copy of the current modelfile:

ollama show --modelfile mistral-nemo > Modelfile

Now edit the file and do two things:

Where it says FROM and a long file path, change the file path to the name of the model. eg, FROM /root/.ollama/models/blobs/sha256-b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94 becomes FROM mistral-nemo
Create a new line in the file and add PARAMETER num_ctx 120000

Create the new model:

ollama create -f Modelfile mistral-nemo-120k

Verify the change in default context length, check num_ctx under paramaters:

 $ ollama show mistral-nemo-120k
  Model
        arch                    llama
        parameters              12.2B
        quantization            Q4_0
        context length          1.024e+06
        embedding length        5120

  Parameters
        num_ctx 120000
        stop    "[INST]"
        stop    "[/INST]"

This should solve your context problem, if it persists please follow up with server logs.

Jul 26 '24 09:07 rick-github

As background, ollama API calls (and CLI prompts via ollama run) that don't specify a context length, use a context length of 2048 tokens.

this should be mentioned somewhere in the documentation, if not already.

Jul 27 '24 10:07 parth-mangtani-cerelabs

https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size

Jul 27 '24 11:07 rick-github

thanks @rick-github :) :+1:

Jul 27 '24 12:07 parth-mangtani-cerelabs

ollama create -f Modelfile mistral-nemo-120k

This doesnt seem to work when I try for llama3

Aug 13 '24 16:08 anubhav-agrawal-mu-sigma

What exactly doesn't work?

Aug 13 '24 16:08 rick-github

@rick-github thanks for the awesome explanation! Will close this for now. @rodolfo-nobrega @anubhav-agrawal-mu-sigma feel free to follow up if things still aren't working..

Sep 04 '24 01:09 jmorganca

https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size

the faq needs to be updated with your example as it took a lot of time to understand and configure the solution as it is NOT mentioned in faq - https://github.com/ollama/ollama/issues/5965#issuecomment-2252354726

Nov 27 '24 10:11 montvid

I agree, I have a WIP to update various bits of documentation but haven't pushed a PR yet.

Nov 27 '24 10:11 rick-github

It should be noted that changing the context size of a model via the Modelfile is mentioned, but in the OpenAI doc , not the FAQ.

Nov 27 '24 10:11 rick-github

Bug Report: Context Loss

What is the issue?

Bug Report: Context Loss in mistral-nemo Model

Description

Steps to Reproduce

Temporary Solution

Additional Information

API Usage

Conclusion

OS

GPU

CPU

Ollama version

Bug Report: Context Loss in `mistral-nemo` Model