zed Support using ollama as an inline_completion

Check for existing issues

[X] Completed

Describe the feature

I am successfully using my local ollama models using assistant panel.

I would love to be able to use them as well as an inline_completion_provider.

Currently, only none, copilot or supermaven values are supported.

If applicable, add mockups / screenshots to help present your vision of the feature

No response

Aug 08 '24 10:08 aitorpazos

Some considerations that come to mind regarding this, keeping in mind the current support for Ollama:

In contrast to something like GitHub Copilot, whose entire purpose is to provide inline completions, some of the most popular models used with Ollama, such as llama3.1 DO NOT support inline completion. For those that do, such as deepseek-coder-v2, you can make a POST request to /api/generate like:

{
  "model": "deepseek-coder-v2:latest",
  "prompt": "time := time.",   
  "suffix": "    return time;",
  "options": {
    "temperature": 0
  },
  "keep_alive": -1,
  "stream": false
}

And get back a response such as:

{
  "model": "deepseek-coder-v2:latest",
  "created_at": "2024-09-07T14:28:17.013718016Z",
  "response": "Now().Unix()\n",   // our inline completion, inserted between prompt and suffix
  "done": true,
  "done_reason": "stop",
  // metadata fields omitted 
}

For models that do not support inline completions the above request results in i.e.:

{
  "error": "llama3.1:latest does not support insert"
}

However it would be desirable to use say llama3.1 for chat while using deepseek-coder-v2 for inline completion at the same time, therefore the list of Olllama inline completion models should be separate from chat models.

Another consideration is how much sense it would make to support remote Ollama instances for inline completions. , I've got Ollama running both locally on my laptop and on a server I've got on my LAN to get access to more powerful models. Anecdotally, the difference between local response times (~100ms, vs the server ~300ms), mean that using the LAN server for inline completions is probably impractical vs using it for chat is just fine. It also means that it would be nice if the chat Ollama provider vs the inline completion Ollama provider did not share the same base URL.

Sep 07 '24 14:09 MatejLach

This repo was mentioned in #14134. I am including it here so it doesn't get forgotten and for easier reference.

Proxy that allows you to use ollama as a copilot like Github copilot

https://github.com/bernardo-bruning/ollama-copilot

Written in Go

Sep 29 '24 03:09 skewty

@skewty Seems like that wont work? https://github.com/zed-industries/zed/issues/6701

Oct 04 '24 14:10 xmaayy

As mentioned in #16030 to address your concern @MatejLach, we should be able to configure an autocomplete model along side a chat model.

Copy pasting from the issue:

Heres an example of the config.json in Continue.dev to change autocomplete model:

  "tabAutocompleteModel": {
    "title": "Tab Autocomplete Model",
    "provider": "ollama",
    "apiBase": "http://localhost:11434",
    "model": "deepseek-coder:6.7b",
    "contextLength": 8192
  },
  "tabAutocompleteOptions": {
    "prefixPercentage": 0.5,
    "maxPromptTokens": 4096
  },

Oct 08 '24 04:10 tlvenn

@tlvenn this config is for continue.dev and this extention is not present in zed editor now that how would that work in zed?

Dec 05 '24 06:12 navidRashik

I just started testing zed and I think it's a really necessary and important task for to transition from vscode with continue.dev

Jan 04 '25 19:01 F1shez

This feature is a must if I am going to move from Cursor. Github copilot is not good enough so I at least want to have more options.

Jan 18 '25 23:01 jorge-menjivar

Another consideration is how much sense it would make to support remote Ollama instances for inline completions

Not sure if there are any specific concerns with ollama, but generally speaking I think it makes perfect sense to support remote instances. For instance I sometimes use an old quad-core Celeron laptop (with 4 GB RAM and no GPU) for coding. It's completely useless for LLM inference, but I have a 12-core Linux server with 32GB RAM and a GTX 1080 GPU in my LAN and it serves up 7B models in llama.cpp without breaking a sweat. I'm also quite confident that my LAN does not add any noticeable lag. I for one would prefer running models on a locally hosted remote server.

Feb 09 '25 10:02 mbitsnbites

Another consideration is how much sense it would make to support remote Ollama instances for inline completions

Not sure if there are any specific concerns with ollama, but generally speaking I think it makes perfect sense to support remote instances.

I think, given that ollama is an HTTP API, the specifics of where you host your ollama instance is not really of Zed's concern, right?
Wether you pont Zed to a localhost URL or an IP three countries over, Zed's only ask is that the API is accessible.

Feb 09 '25 12:02 mcmacker4

IMO, the feature (ollama as an inline_completion_provider) is very important to every developer work in LAN.

Feb 14 '25 10:02 kylelee

Has anyone figured out a decent workaround for this issue in the meantime?

Apr 10 '25 18:04 xxfogs

With this now merged https://github.com/zed-industries/zed/pull/24364 you might give this a try https://github.com/bernardo-bruning/ollama-copilot

Apr 10 '25 19:04 obrhoff

With this now merged #24364 you might give this a try bernardo-bruning/ollama-copilot

Is there a way to configure the relevant proxy settings in Zed so this can work with features.inline_completion_provider set to copilot?

Apr 12 '25 23:04 quantum9Innovation

With this now merged #24364 you might give this a try https://github.com/bernardo-bruning/ollama-copilot

Thanks for the idea, but I haven't had much luck with it so far.

Apr 13 '25 15:04 xxfogs

This feature is the only reason I haven't switched to ZED yet. I would love to see it. Thank you for your hard work.

May 09 '25 11:05 saleh-mir

Just my two cents. I often use Zed when editing sensitive data, and I can't use inline completions at all because of that. I just don't think it's safe to rely on third-party completion providers when dealing with credentials.

May 14 '25 19:05 rcny

Just my two cents. I often use Zed when editing sensitive data, and I can't use inline completions at all because of that. I just don't think it's safe to rely on third-party completion providers when dealing with credentials.

That is exactly the main reason why I need it.

May 14 '25 21:05 saleh-mir

Adding google juice because the term inline_completion_provider no longer shows up in Zed default settings:

Support using ollama as edit_prediction_provider

Jul 05 '25 16:07 tv42

Hello. I am trying zed right now, I would just like to note that ollama does in fact support completion models, except you gotta be very specific how you "feed it" and has too happen with specific models, in particular models like qwen2.5-coder:1.5b-base and alike with that -base tag. there are several others, giving a look at continue.dev docs would give you an idea.

but basically before the instruct step all models I think they get trained for completion like GPT-2 was and like even GPT-3 was... as per paper...

at the moment I just looked at zed so I am not really committed at swiching over or bothering to contribute. but maybe? don't know...

Aug 25 '25 20:08 gabrielesilinic

There is currently a functional PR here: https://github.com/zed-industries/zed/pull/33616

Sep 02 '25 19:09 ThePerfectComputer

@gabrielesilinic some "coder" models get FIM training (Fill-In-the-Middle) with specific format, qwen2.5-coder is a good example. They expect input in this format:

<｜fim▁begin｜>
...code before...
<｜fim▁hole｜>
...code after...<｜fim▁end｜>

And they output what should be in <｜fim▁hole｜> place, paying attention not only to "code before" the completion place, but also to "code after" (so contextually mdel-provided code makes more sense).

Now if you use "base" models, they will do the job (even Llama3.1 or any other), but you'll have to use them without any special FIM formatting and so they'll just take in the "code before" the completion place. So the result will be somewhat of a lesser quality (as the model will not have information on "code after"). But it'll work - I tried it with different models many times. Instruction tuned models don't work so well at all, they tend to add some comments/confirmations/other chatting, not just code.

Sep 13 '25 16:09 grigoriy-a

@ThePerfectComputer @oliverbarnes Guys, why this deep ollama integration? Whole LLM world just works with OpenAI API as standard and that's more than enough. Even ollama itself supports OpenAI API. OpenAI API would allow to use not just ollama and OpenAI, but also llama.cpp, llama-server, vllm, OperRouter, LM Studio and god knows what else. No offense. And I might have misunderstood something from the PR code I saw.

Sep 13 '25 16:09 grigoriy-a

@grigoriy-a that's a fair question, and it has been brought up on the PR as well.

I see the argument for having a generic OpenAI-compatible provider, but right now both the inline assistant and agent Ollama implementations are native, and integrations with providers like LM Studio, though OpenAI compatible, are done piecemeal.

So the PR tries to be consistent with these implementation patterns, respecting the team's previous decisions, for which there might be reasons we're not aware of.

I get the frustration in waiting for individual implementations if they're simply OpenAI-compatible, though. Maybe worth creating a new discussion about it here on Github?

Sep 16 '25 10:09 oliverbarnes

Hi there! 👋 We're working to clean up our issue tracker by closing older bugs that might not be relevant anymore. If you are able to reproduce this issue in the latest version of Zed, please let us know by commenting on this issue, and it will be kept open. If you can't reproduce it, feel free to close the issue yourself. Otherwise, it will close automatically in 14 days. Thanks for your help!

Nov 19 '25 07:11 github-actions[bot]

3 thumbs down and no one commented 🤔

Nov 19 '25 09:11 andreymal

I don't know why no one cares about using local models

Nov 19 '25 10:11 saleh-mir

Keep this open 👁️

Nov 19 '25 12:11 0x524c

I don't know why no one cares about using local models

I do.

Nov 19 '25 14:11 pokatomnik

I would like the ability to use a local model for Inline Completion, Agent Thread, Text Thread and Edit Prediction. So you can still ask questions, get some predictions & generate commit messages offline if necessary.

It seems like it doesn't remember your Ollama selection for Commit message generation and you have to manually connect Ollama via Agent Menu Options each time you start up Zed.

It also seems like Zed doesn't release my ram after a commit message generation. When I use Ollama local model in JetBrains products my ram stays engaged 8-16GB for a while then after a successful request is completed after some timeout it gets released again.

Nov 19 '25 14:11 mite404

Support using ollama as an inline_completion_provider

Check for existing issues

Describe the feature

If applicable, add mockups / screenshots to help present your vision of the feature

Proxy that allows you to use ollama as a copilot like Github copilot