continue icon indicating copy to clipboard operation
continue copied to clipboard

Ability to use a different model for tab completion

Open jbohnslav opened this issue 1 year ago • 4 comments

Validations

  • [X] I believe this is a way to improve. I'll try to join the Continue Discord for questions
  • [X] I'm not able to find an open issue that requests the same enhancement

Problem

For question answering, RAG, editing, code generation, etc. I want to use the biggest, slowest model I can fit on my machine, as that will have the highest accuracy. When I'm asking questions or using /edit, I expect that to take some time. A good model for this might be DeepSeek33B. This should be an instruction-tuned model.

However, for tab completion, I want it to be fast, even at the cost of accuracy like DeepSeek1B. Furthermore, I don't want it to be instruction tuned, as the next-token pretraining objective is perfect for tab completion.

Solution

In config.json, we should be able to specify which model we want for tab completion. Furthermore, the codebase should be able to handle sending tab completion requests to one model and all other requests to another model.

jbohnslav avatar Feb 25 '24 08:02 jbohnslav

@jbohnslav this is already possible! Check out the docs here for setting up a custom tab autocomplete model. And here for setting up a chat/quick edit model

Let me know if I can help at all with setting up : )

sestinj avatar Feb 25 '24 21:02 sestinj

@sestinj: Right now it is only possible with Ollama? Tried withLMStudio, but not able to suceed. Maybe you can help with the setup if LMStudio is supported? What i configured in config.json and tried with different endpoints is (below models sections):

"tabAutocompleteModel": { 
    "title": "Tab Autocomplete Model",
    "provider": "lmstudio",
    "model": "Phi2",
    "apiBase": "http://localhost:1234/v1/models"
  },

JosefLaumer avatar Feb 29 '24 09:02 JosefLaumer

@JosefLaumer No need to change the API Base, but if you wanted to it should be http://localhost:1234/v1 (we default to this). I believe this would solve your problem:

"tabAutocompleteModel": { 
    "title": "Tab Autocomplete Model",
    "provider": "lmstudio",
    "model": "Phi2"
  },

sestinj avatar Feb 29 '24 21:02 sestinj

I have a similar problem, I run text gen webui API and it just tells me that openai doesnt have Mistral-7B as a model. When I add text-gen-webui as provider it tells me its unknown

Wladastic avatar Mar 02 '24 11:03 Wladastic

I'm using a remote machine to run Ollama Models, trying to use the same model for chat and auto tab completion.

"models": [
    {
      "title": "Ollama Remote",
      "model": "codestral:22b",
      "completionOptions": {
        "keepAlive": 3000000,
      },
      "apiBase": "http://192.168.1.131:11434",
      "provider": "ollama"
    }
  ],
"tabAutocompleteModel": {
    "title": "Ollama Remote",
    "provider": "ollama",
    "apiBase": "http://192.168.1.131:11434",
    "model": "codestral:22b", 
    "completionOptions": {
      "keepAlive": 3000000,
    }
  },

When opening chat, the model is loaded, but when i hop in the editor then tab auto complete model is loaded and previous one is unloaded. this takes huge time, how can i use the same model in Chat and Autocomplete without reloading the model in ollama?

craftpip avatar Jun 07 '24 10:06 craftpip

@craftpip given your config here it doesn't seem like the unloading/reloading should happen, but if anything it might be that the different values of the keepAlive parameter are causing this. It's just the first thing that comes to mind, but I would try adding the same keepAlive value to your chat model

sestinj avatar Jun 07 '24 17:06 sestinj

This is now possible (in the latest VS Code Pre-release) by using an array for tabAutocompleteModel, and then clicking on the "Continue" button in the status bar

sestinj avatar Jun 19 '24 14:06 sestinj