zed Add the possibility to use offline models (maybe via ollama)

Check for existing issues

[X] Completed

Describe the feature

Hi,

Having the possibility to use other models for example llama (most likely via ollama) would be really amazing instead of being forced to use the proprietary and unethical ChatGPT.

Here's a link to their API docs: https://github.com/jmorganca/ollama/blob/main/docs/api.md

Since an API is also used for ChatGPT it shouldn't be too much work

If applicable, add mockups / screenshots to help present your vision of the feature

No response

Oct 14 '23 08:10 nym21

This would be awesome.

this project seems like it could serve as inspiration for the feature

https://github.com/continuedev/continue

Feb 07 '24 19:02 willtejeda

It should be easy to add support to any service that provides an OpenAI-compatible API, such as Perplexity, or LiteLLM for local models.

Feb 08 '24 08:02 tjohnman

ollama is now OpenAI-compatible as well: https://ollama.ai/blog/openai-compatibility

Feb 09 '24 11:02 octoth0rpe

a quantized version of CodeLlama would work well locally on macs:

https://huggingface.co/TheBloke/CodeLlama-34B-GGUF

ollama has a way of interacting with a quantized CodeLlama, but up to the Zed team whether they'd rather use ollama or run llama.cpp within Zed (ollama runs llama.cpp under the hood)

IMO this should be more generic than "offline vs. online", and more about giving users choice in which CoPilot model they'd like to use. There's a balance for sure!

Feb 14 '24 21:02 shaqq

A low effort approach to include this feature is allowing the configuration of a custom proxy for copilot. It is already possible today in VSCode's Copilot extension using this config:

"github.copilot.advanced": {
    "debug.testOverrideProxyUrl": "http://localhost:5001",
    "debug.overrideProxyUrl": "http://localhost:5001"
}

It would be really cool to have this tweak available in Zed too

Feb 19 '24 20:02 Belluxx

ollama is now OpenAI-compatible as well: https://ollama.ai/blog/openai-compatibility

This is how, it's working for me

I couldn't add my custom model to the "default_open_ai_model" setting, For now, Zed is allowing only OpenAI models ("gpt-3.5-turbo-0613","gpt-4-0613","gpt-4-1106-preview"). So I had to clone it to proxy it.

Pull and run Mistral model from Ollama library Clone the Mistral model as gpt-4-1106-preview

ollama run mistral
ollama cp mistral gpt-4-1106-preview

Added this to my Zed Settings (~/.config/zed/settings.json)

"assistant": {
    "openai_api_url": "http://localhost:11434/v1"
 }

Restart Zed

Feb 22 '24 07:02 sumanmichael

@sumanmichael Thank you for the tip. Can you confirm that this only works for "Assistant Panel" (chat) and "Inline Assist"? I tried to use it for copilot (theoretically supported since copilot uses OpenAI-APIs) but with no success because i don't have a Microsoft/Copilot account.

Is there a way to bypass Zed login requirement to use Copilot?

Feb 23 '24 08:02 Belluxx

Can this also be made to work with any local server running an API similar to OpenAI's API? Specifically, I'm interested in using LM Studio.

Feb 26 '24 09:02 taylorgoolsby

just integrate continue.dev please this will exponentially increase adoption as continue dev solves all of llm worries and works with all possible providers both local and cloud

Feb 26 '24 14:02 JayGhiya

Can this also be made to work with any local server running an API similar to OpenAI's API? Specifically, I'm interested in using LM Studio.

Same here, I'm using LiteLLM which presents an OpenAI-compatible API, and integrates with a bunch of model loaders on the back end (ollama, tgi, etc.). Would be nice to be able to just set a URL and token and have it use my server.

Feb 26 '24 14:02 oxaronick

just integrate continue.dev please this will exponentially increase adoption as continue dev solves all of llm worries and works with all possible providers both local and cloud

Yeah, Continue is very flexible. This is what the Continue config looks like:

  "models": [
    {
      "title": "mixtral",
      "provider": "openai",
      "model": "mixtral:8x7b",
      "apiBase": "https://skynet.becomes.self.aware.io:444",
      "apiKey": "sk-somethingsomething"
    }
  ]

I suppose if Zed only supports the OpenAI API it wouldn't need provider, but otherwise that's a nice way to let users configure their server.

Feb 26 '24 15:02 oxaronick

just integrate continue.dev please this will exponentially increase adoption as continue dev solves all of llm worries and works with all possible providers both local and cloud

Yeah, Continue is very flexible. This is what the Continue config looks like:
  "models": [
    {
      "title": "mixtral",
      "provider": "openai",
      "model": "mixtral:8x7b",
      "apiBase": "https://skynet.becomes.self.aware.io:444",
      "apiKey": "sk-somethingsomething"
    }
  ]
I suppose if Zed only supports the OpenAI API it wouldn't need provider, but otherwise that's a nice way to let users configure their server.

So I suppose a couple of QoL improvements can be made here, assuming all LLM below are compliant with OpenAI's specs:

[ ] Zed should be able to store multiple LLM configs, following the schema used by continue.dev, which, IMHO looks pretty robust.
[ ] Zed should be able to toggle between multiple LLM providers using configs provided by users.

Feb 26 '24 19:02 jianghoy

Zed should be able to toggle between multiple LLM providers using configs provided by users.

As a starting point, even the ability to configure one model hosted locally or on my server would be great.

Feb 26 '24 20:02 oxaronick

Also, since OpenAI is so proprietary, I do not really feel comfortable with the idea that all these open source/weight models are copying the OpenAI API spec. It would not surprise me if in the future, an open standard is created rather than relying on OpenAI to set the standard. I'm not saying we should decide on an open spec right here right now, but just wanted to point this out, and emphasis a need for simplicity.

Feb 27 '24 00:02 taylorgoolsby

Also, since OpenAI is so proprietary, I do not really feel comfortable with the idea that all these open source/weight models are copying the OpenAI API spec. It would not surprise me if in the future, an open standard is created rather than relying on OpenAI to set the standard. I'm not saying we should decide on an open spec right here right now, but just wanted to point this out, and emphasis a need for simplicity.

The OpenAI API spec is already the norm for many libraries. Ollama has a PR to list model endpoint soon: https://github.com/ollama/ollama/pull/2476

I would propose to keep it simple and allow custom models names if the default api is anything other than openai

Feb 27 '24 21:02 franz101

ollama is now OpenAI-compatible as well: https://ollama.ai/blog/openai-compatibility

This is how, it's working for me

I couldn't add my custom model to the "default_open_ai_model" setting, For now, Zed is allowing only OpenAI models ("gpt-3.5-turbo-0613","gpt-4-0613","gpt-4-1106-preview"). So I had to clone it to proxy it.

Pull and run Mistral model from Ollama library Clone the Mistral model as gpt-4-1106-preview
ollama run mistral
ollama cp mistral gpt-4-1106-preview
Added this to my Zed Settings (~/.config/zed/settings.json)
"assistant": {
    "openai_api_url": "http://localhost:11434/v1"
 }
Restart Zed

I have tried this solution with Mistral running locally with ollama. It doesn't work for me. Did anybody else actually make this work?

Mar 04 '24 09:03 thosky

Works for

ollama is now OpenAI-compatible as well: https://ollama.ai/blog/openai-compatibility

This is how, it's working for me I couldn't add my custom model to the "default_open_ai_model" setting, For now, Zed is allowing only OpenAI models ("gpt-3.5-turbo-0613","gpt-4-0613","gpt-4-1106-preview"). So I had to clone it to proxy it. Pull and run Mistral model from Ollama library Clone the Mistral model as gpt-4-1106-preview
ollama run mistral
ollama cp mistral gpt-4-1106-preview
Added this to my Zed Settings (~/.config/zed/settings.json)
"assistant": {
    "openai_api_url": "http://localhost:11434/v1"
 }
Restart Zed
I have tried this solution with Mistral running locally with ollama. It doesn't work for me. Did anybody else actually make this work?

Works for me as described using codellama:7b-instruct or mistral. Did just a short test repo for testing. Screenshot 2024-03-04 at 20 42 39 Screenshot 2024-03-04 at 20 43 48

Mar 04 '24 20:03 janerikmai

How would zed know what a model's token limit is?

Also, as a side note, some models use different tokenizers. Some well known ones are BPE, SentencePiece, and CodeGen. Counting tokens using the wrong tokenizer would produce inaccurate counts.

Mar 05 '24 02:03 taylorgoolsby

@janerikmai Thanks for confirming it's working. It's working for me as well now after reinstalling Ollama. The old version might not have supported the open api.

Mar 06 '24 09:03 thosky

If you want to use another model available on Hugging Face that's not native to Ollama (DeepSeek-Coder, WizardCoder etc), when creating from GGUF you can explicitly name it to be compatible with Zed whilst reading from the Modelfile.

ollama create gpt-4-1106-preview -f Modelfile

Mar 13 '24 17:03 james-haddock

After https://github.com/zed-industries/zed/pull/8646 to make local LLM work you need to add this to Zed Settings (~/.config/zed/settings.json)

  "assistant": {
    "provider": {
      "type": "openai",
      "api_url": "http://localhost:11434/v1"
    }
  }

At least it works for me

Mar 15 '24 10:03 Krukov

How is this working with the openAI calls for ada embeddings? or is that just dysfunctional?

crates/ai/src/providers/open_ai/embeddings.rs

impl OpenAiEmbeddingProvider {
    pub async fn new(client: Arc<dyn HttpClient>, executor: BackgroundExecutor) -> Self {
        let (rate_limit_count_tx, rate_limit_count_rx) = watch::channel_with(None);
        let rate_limit_count_tx = Arc::new(Mutex::new(rate_limit_count_tx));

        // Loading the model is expensive, so ensure this runs off the main thread.
        let model = executor
            .spawn(async move { OpenAiLanguageModel::load("text-embedding-ada-002") })
            .await;
        let credential = Arc::new(RwLock::new(ProviderCredential::NoCredentials));

        OpenAiEmbeddingProvider {
            model,
            credential,
            client,
            executor,
            rate_limit_count_rx,
            rate_limit_count_tx,
        }
    }
    // ... additional code
 }

Mar 18 '24 23:03 versecafe

For what it's worth this is what I needed to do to make it work locally:

  "assistant": {
    "version": "1",
    "provider": {
      "name": "openai",
      "api_url": "http://localhost:11434/v1"
    }
  }

Then pick a model: ollama run mistral and then ollama cp mistral gpt-4-turbo-preview. Restart zed and enjoy.

Mar 21 '24 07:03 andreicek

I tried this and I get a prompt to enter an OpenAI API key. I seem blocked even though I have the assistant config mentioned earlier.

If I enter a junk key it doesn't help then I get errors about not being able to connect to OpenAI. So the settings aren't working apparently?

my $HOME/.config/zed/settings.json zed version: Zed 0.128.3

{
  "ui_font_size": 16,
  "buffer_font_size": 16,
  "assistant": {
    "version": "1",
    "provider": {
      "name": "openai",
      "api_url": "http://localhost:11434/v1"
    }
  }
}

After I add a valid OpenAI API key things seem to work. I choose gpt-4-turbo which I setup with ollama: ollama cp mistral gpt-4-turbo-preview.

With that I try a prompt in a file and it seems my local LLM is too slow, zed says: "request or operation took longer than the configured timeout time". I don't get any auto-complete bits in assistant or provider json entries for time...

I see this timeout is a known issue: https://github.com/zed-industries/zed/issues/9913

Mar 28 '24 14:03 craigcomstock

~Looks like there are some new required settings,~ this does the trick for me:

  "assistant": {
    "version": "1",
    "provider": {
      "name": "openai",
      "type": "openai",
      "default_model": "gpt-4-turbo-preview",
      "api_url": "http://localhost:11434/v1"
    }
  }

Edit: maybe I just lost the API key after a restart 😅

Mar 28 '24 15:03 duggan

Here is a complete rundown of how I got it to work after collecting all the pieces of information in this thread: Note: I run Ollama on Mac, if you run anything differently, you have to adapt it.

Add the following configuration to Zed config:

"assistant": {
    "version": "1",
    "provider": {
      "name": "openai",
      "type": "openai",
      "default_model": "gpt-4-turbo-preview",
      "api_url": "http://localhost:11434/v1"
    }
  }

Download Mistral via ollama cli:

ollama run mistral

Copy the downloaded Mistral, changing its name:

ollama cp mistral gpt-4-turbo-preview

Add the OpenAI API key to Zed (source):

ollama

Restart Zed to ensure everything is working as it is supposed to.

I hope this helps!

Apr 13 '24 13:04 ednanf

I wish Zed could provide an easy way to point to a server API like the Continue extension. Here is an example in my config file in VSCodium.

"models": [
    {
      "title": "Mistral",
      "model": "mistral-7b-instruct-v0.2-code-ft.Q5_K_M",
      "contextLength": 4096,
      "provider": "lmstudio"
    }
  ],
"tabAutocompleteModel": {
    "title": "Tab Autocomplete Model",
    "provider": "lmstudio",
    "model": "mistral-7b-instruct-v0.2-code-ft.Q5_K_M",
    "apiBase": "http://localhost:1234/v1/"
  },

LM Studio supports multiple endpoints for different context:

GET /v1/models
POST /v1/chat/completions
POST /v1/embeddings            
POST /v1/completions

I followed @sumanmichael (Thank you so much) steps to make my local Mistral work, but while the chat box works flawlessly, the code completion is still messy compared to the same exact model running through Continue in VSCodium.

Being able to manually set an API endpoint instead of letting Zed concatenating '/completion' to an alleged openai server HTTP address would allow any type of installation.

Apr 21 '24 12:04 matthieuHenocque

Yeah, Continue is very flexible. This is what the Continue config looks like:

  "models": [
    {
      "title": "mixtral",
      "provider": "openai",
      "model": "mixtral:8x7b",
      "apiBase": "https://skynet.becomes.self.aware.io:444",
      "apiKey": "sk-somethingsomething"
    }
  ]

fwiw, I managed to get Continue to generate this very useful config fragment for using Ollama:

"models": [
    {
      "model": "AUTODETECT",
      "title": "Ollama",
      "completionOptions": {},
      "apiBase": "http://localhost:11434",
      "provider": "ollama"
    }
  ],

You need to restart VSCode when you add a model to Ollama, but at least you don't need to add another config fragment... very nice.

Apr 25 '24 11:04 psyv282j9d

vote for this feature。 copilot and lsp should be plugins，users can have choices on their own needs

May 09 '24 03:05 aohan237

Here is a complete rundown of how I got it to work after collecting all the pieces of information in this thread: Note: I run Ollama on Mac, if you run anything differently, you have to adapt it.

[snip]

4. Add the OpenAI API key to Zed ([source](https://ollama.com/blog/openai-compatibility)):

ollama

5. Restart Zed to ensure everything is working as it is supposed to.

I hope this helps!

I gives hope, at least. Can you please tell us how to define the key in ollama and the exact syntax for adding it to the zed config?

Trying to add a key in the assistant panel results in the following in the console: crates/assistant/src/completion_provider/open_ai.rs:255: DBus error Prompt was dismissed

May 11 '24 20:05 dagbdagb

zed zed copied to clipboard

Add the possibility to use offline models (maybe via ollama)

Check for existing issues

Describe the feature

If applicable, add mockups / screenshots to help present your vision of the feature

zed
zed copied to clipboard