vscode-copilot-release Agent Mode for 3rd party Model Providers

The new Agent Mode is great. The ability to add custom Model Providers is great too. A huge drawback is, that Agent Mode is not supported when using Ollama or other providers. Adding Agent Mode support for third party models would further improve the experience for the users:

Local inference engines such as Ollama, llama.cpp server, vLLM etc. all support tool calling, if the model loaded supports it. LM Studio even shows which model supports Tool Calling with a small icon in the model list.
Therefore I guess they either maintain this information for popular models manually, or more likely they detect tool calling support by parsing the gguf file format - not sure.
Ollama itself through the /api/show/ endpoint shows model information, including a capabilities list. Not sure if tool calling is exposed as a capability there.

TLDR: Allow the user to use Agent Mode even for third party Model Provider like Ollama. If the loaded model doesn't support tool calling an error can be surfaced. No need for VS Code to check for tool calling capability.

Apr 14 '25 15:04 underlines

@underlines We are getting the capabilities from the endpoint. And it does work, I just find many of the models don't like to call tools very much

Here's it failing

Here's it succeeding

Apr 14 '25 16:04 lramos15

@lramos15 Thanks for answering.

We are getting the capabilities from the endpoint. Does this mean, GitHub Copilot currently fetches capabilities and only exposes models with tool calling capabilities to be shown in the drop down, when Agent Mode is selected? Or do we need to be on insider build to have this? Stable doesn't show any ollama fetched models for Agent Mode.
Insider build would probably allow us to use the "github.copilot.chat.byok.ollamaEndpoint" setting as well, right?
Does GitHub Copilot properly set a larger context window using "options": {"num_ctx": 32000}? The context size highly affects VRAM usage and should probably be user-editable. If GitHub Copilot uses the standard value by not setting it, it would be 4000, which would mean the agent starts hallucinating by losing initial context pretty fast. Agent-Workflows need at least 16k context.

For your tool calling issues: 4. What Qwen2.5-Coder model did you load? Anything below 24B Parameters will be miserable for Tool Calling from my experience. 5. According to the Function Calling leaderboard, Mistral Small is one of the top open models to perform well on Tool Calling.

Apr 14 '25 16:04 underlines

Insiders, but everything that goes into insiders comes to stable the following month
Insiders as well
We do not, this is something that you will need to do before running Ollama serve / in the model file, this is because we're using the Open AI compatible Ollama api to request chat completions and this doesn't support num_ctx. https://www.ollama.com/blog/openai-compatibility
This was 32b I believe
I've tried that one as well without a ton of luck, I currently am using an RTX 3090. Maybe I don't have enough Vram.

Apr 14 '25 16:04 lramos15

@lramos15 Thanks for answering my previous questions—really helpful!

I installed the latest Windows Insiders Build, checked for updates, added one Ollama model and one OpenRouter model. Neither model shows up when switching to Agent Mode. What am I doing wrong?

Context Window: I'm still unclear on how to persistently set num_ctx for specific local models in GitHub Copilot using ollama. As far as I understand, the current options are:

System variable: OLLAMA_CONTEXT_LENGTH=16000 sets it globally, but it’s inflexible—can’t be changed per model or at runtime.
CLI ollama run: Using /set num_ctx 16000 works but needs to be run before every model load in Copilot, which is manual and unintuitive.
Modelfile: Creating a new Modelfile with PARAMETER num_ctx from the original model, seems cumbersome and not ideal for quick changes.
API: Sending "options": {"num_ctx": 16000} in API requests works well and is how tools like Open WebUI handle it. It looks like num_ctx should be handled by the calling application, such as GitHub Copilot via the options, also storing this in VS Code setting to have persistent model options.

Given these limitations, is there a better way to define per-model num_ctx settings when using ollama locally with Copilot?

Apr 14 '25 20:04 underlines

I think you've listed all the possible options. I personally use OLLAMA_CONTEXT_LENGTH, and only load one model at a time as I don't have enough VRAM to load multiple.

We cannot do num_ctx due to the aformentioned Open AI compatible api surface. We would need to switch to the Ollama specific API which can be a bit challenging given that so many of our bring your own key providers are built on top of Open AI compatible specs.

Apr 14 '25 20:04 lramos15

@lramos15

Sorry, I ninja edited my previous answer:

I installed the latest Windows Insiders Build, checked for updates, added one Ollama model and one OpenRouter model. Neither model shows up when switching to Agent Mode. What am I doing wrong?

Added OpenRouter models
Shows up in Edit mode
Switch to Agent Mode, then doesn't show any Ollama or OpenRouter models:

Apr 14 '25 20:04 underlines

99% of the free Open Router models don't support tool calls so agent mode cannot work without tool calling. But as you see Gemini 2.5 Pro experimental (free) is there.

Apr 14 '25 20:04 lramos15

Oh, I get it now, you said it in the beginning: GitHub Copilot actually respects the capabilities and only allows you to use models with properly set capabilities. If we want to try models that didn't set set the tool calling capability, we would have to recreate an ollama modelfile with this set. Right?

Apr 14 '25 21:04 underlines

@underlines Yes, but you'll likely have a bad experience if the model doesn't actually support tools.

Apr 15 '25 00:04 lramos15

I stumbled upon the same issue and coded up a little proxy script that logs the requests and changes "max_tokens":4096 to "options":{"num_predict":4096,"num_ctx":70000} on the fly for the v1/chat/completions endpoint. That helps with ollama.

https://github.com/michael-heinrich/llm_proxy/blob/main/ollama_proxy.py

Apr 24 '25 17:04 michael-heinrich

In my case, Gemini 2.5 Flash Preview doesn't show in agent mode on VS Code Insider. It is shown in Ask and Edit mode. Do the same thing on the stable release, the model is shown in agent mode.

When adding a custom model to a VS Code stable release, it asks whether the model supports tool calling. I don't see this option when adding a custom model to VS Code Insider.

Apr 28 '25 12:04 rizbud

In summary, the experience could be improved. I can think of the following improvements:

When adding custom LLM Providers / Models, ask the user if the model supports Tool Calling (There are models supporting it, without reporting it in their capabilities)
Since @lramos15 mentions the projects wants to keep the generic OpenAI compatible API and not the Ollama spec endpoint, I suggest renaming the custom provider from "ollama" to "OpenAI compatible" would clear things up.
Allow users to set a custom endpoint host and then everybody can use their favorite inference engine (vLLM, llama.cpp server, Ollama, etc.)

Apr 28 '25 14:04 underlines

Relevant: https://github.com/microsoft/vscode-copilot-release/issues/7289

Apr 29 '25 16:04 underlines

How large context window copilot needs to operate effectively as an agent? I noticed that it run in loops when I forgot to set it to 16k. I'd like to know if it works better with larger context window or is there any required minimal size?

May 02 '25 00:05 predkambrij

Now that agentics is becoming more common in open source models such as Qwen 3, this would be a prudent capability to revisit adding. Let's enable agent mode for ollama based model hosting.

Jul 07 '25 18:07 SolshineCode

+ 1

Jul 18 '25 05:07 Potatooff

+1

Jul 18 '25 10:07 zd1990

A lot of the +1 comments should realize that this already works, it depends on the provider advertising us tool calling capabilties.

Jul 18 '25 20:07 lramos15

@lramos15 Based on what I'm seeing, trying dozena of Ollama models in github copilot in VS code, and none of them have agent available. Here for example, this is explicitly an agentic coding model but when loaded into ollama and vscode's github copilot then it only allows ask or edit not agent mode: https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF

Jul 23 '25 03:07 SolshineCode

@lramos15 Based on what I'm seeing, trying dozena of Ollama models in github copilot in VS code, and none of them have agent available. Here for example, this is explicitly an agentic coding model but when loaded into ollama and vscode's github copilot then it only allows ask or edit not agent mode: https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF

I was able to use qwen3:8b (500a1f067a9f) with copilot version: 1.350.0 and vscode 1.102.3. But it displays a lot of thinking (first is hidden with /nothinking but after the tool ran, it starts its monologue...). Some other models can be displayed as Agent, but they do not to call the MCP tools. Instead they will tell me how to call or configure the agent.

Aug 04 '25 01:08 functario

We really need this! Qwen3-coder is extremely fast, with a token rate of 2,000 tokens per second. It also has similar code quality to Claude 4 Sonnet.

Aug 07 '25 03:08 aurexav

any reason why Ollama models not supported yet by the agent in vscode? i can use Ollama only for ask mode and not for agent

Aug 16 '25 20:08 amitrintzler

Closing as this does in fact work. Your model must advertise tool calling capabilities. Ollama also often doesn't launch with a large enough context window so make sure you're launching it with the appropriate env variables to increase the context window.

I recommend using Openrouter it appears to work the best in agent mode for me.

Aug 18 '25 13:08 lramos15

Closing as this does in fact work. Your model must advertise tool calling capabilities. Ollama also often doesn't launch with a large enough context window so make sure you're launching it with the appropriate env variables to increase the context window.

I recommend using Openrouter it appears to work the best in agent mode for me.

Do you know which public models that are known to work for Agent mode? For copilot

Aug 18 '25 16:08 prusswan

@prusswan I've tested that "DeepSeek: R1 0528" (deepseek/deepseek-r1-0528) in OpenRouter works well in Agent mode. But I've found a gotcha: you must explicitly tell it to edit files, otherwise it might output in chat panel instead of calling tools. Example:

#codebase For grammar strings that have indent in them (e.g. the one in tests\v2\test_grammar_validator.py:test_rule_with_no_collision), and are passed to (parser class).from_text, edit the relevant files to use textwrap.dedent for the grammar string, in the form of from textwrap import dedent.

I haven't tested other models.

Aug 30 '25 04:08 studyingegret

Hello, I’ve gone through all the messages posted (I came across the first mention about setting a custom URL for Ollama).

Regarding the compatibility of your model with agent mode, I’d like to suggest a solution. Basically, when selecting models in the model manager and going to the Ollama section, it would be helpful if VS Code displayed all models, but grayed out the ones that aren’t compatible with agent mode, or added a small disclaimer to warn users. Ideally, it could also redirect them to a documentation page explaining this in more detail.

I also think that perhaps we should open the issue which, in my opinion, is not finished.

thanks in advance.

Oct 02 '25 05:10 gogo25171