zed icon indicating copy to clipboard operation
zed copied to clipboard

Requesting ability to set max_tokens for ollama models

Open clee opened this issue 1 year ago • 1 comments
trafficstars

Check for existing issues

  • [X] Completed

Describe the feature

There doesn't seem to be any way to set the max_tokens for an ollama assistant model with the 0.147.0 nightly I'm using. I've attempted to set it in the config in a variety of ways and none of them had any visible impact in the UI. I always see "n / 2k" in the top right corner, no matter which model I select.

If applicable, add mockups / screenshots to help present your vision of the feature

image

clee avatar Jul 25 '24 15:07 clee

I was also quite surprised to see it being set to 2k as a limit. At first I thought I may have misconfigured my model, tried just llama3.18b and no dice, swapped to gemma2:27b and still was stuck at 2k. This was with num_ctx set as well.

JoshStrobl avatar Aug 24 '24 12:08 JoshStrobl

@notpeter I love what you've done with #16877 but I would disagree that what's implemented there closes this; it's great to stop using 2048 as the default but this issue as filed is about allowing the user to specify the context size, and you're currently hardcoding the returned context size based on model name. (Thanks for reopening!)

clee avatar Aug 26 '24 15:08 clee

Understood. I mostly implemented #16877 as a stop the bleeding solution to support >2048 tokens in models where it's possible. I can understand the desire for explicit control over this value too.

notpeter avatar Aug 26 '24 16:08 notpeter

Thank you @notpeter for even the stopgap solution, I'm gonna have to switch over to nightlies to give it a go!

Edit: Ah yea, I need to be able to explicitly set the token limit :/

JoshStrobl avatar Aug 27 '24 07:08 JoshStrobl

FYI Zed Preview will have it tomorrow. Stable in a week.

notpeter avatar Aug 27 '24 16:08 notpeter

Zed Preview now supports custom ollama token limits via settings.json. Out of the box you should see llama3.1:latest will now use 16384, but you can tweak this to match your hardware's capabilities.

{
  "language_models": {
    "ollama": {
      "available_models": [
        {
          "provider": "ollama",
          "name": "llama3.1:latest",
          "max_tokens": 65536
        }
      ]
    }
  }
}

I've also updated the Zed Ollama Configuration Docs. Thanks for reporting!

notpeter avatar Aug 30 '24 13:08 notpeter

I create custom modelfile : for example ollama create myllama -f /path/to/custommodelfile custommodelfile:

FROM llama3.1:latest
PARAMETER num_ctx 65536

in zed i load myllama max toke still shown 2k

So...

in zed config I add

{
  "language_models": {
    "ollama": {
      "available_models": [
        {
          "provider": "ollama",
          "name": "llama3.1:latest",
          "max_tokens": 65536
        }
      ]
    }
  }
}

and when I open model list, I got 2 same model named "myllama" myllama 2k max token myllama 65536 max token

is this intended? or zed doesn't know myllama actually set max token in modelfile num_ctx or do i need put something in modelfile so zed can read num_ctx? then correctly show max token?

RickySupriyadi avatar Sep 19 '24 17:09 RickySupriyadi

You might want to add a "display_name" to your available models object to better differentiate. I believe we ask ollama to list available models, but because I don't think they expose an API method to determine the custom num_ctx for a model specified in a Modelfile -- instead we have a hard coded list in the code for common models, with a default of 2048. So the two models are one sourced from the list (2048) and one from your available_models in settings.

https://github.com/zed-industries/zed/blob/d3b46fd2f82abd6af851497980a4e378fe258228/crates/ollama/src/ollama.rs#L73-L89

notpeter avatar Sep 19 '24 17:09 notpeter