Agent Mode for 3rd party Model Providers
The new Agent Mode is great. The ability to add custom Model Providers is great too. A huge drawback is, that Agent Mode is not supported when using Ollama or other providers. Adding Agent Mode support for third party models would further improve the experience for the users:
- Local inference engines such as Ollama, llama.cpp server, vLLM etc. all support tool calling, if the model loaded supports it. LM Studio even shows which model supports Tool Calling with a small icon in the model list.
- Therefore I guess they either maintain this information for popular models manually, or more likely they detect tool calling support by parsing the gguf file format - not sure.
- Ollama itself through the /api/show/ endpoint shows model information, including a capabilities list. Not sure if tool calling is exposed as a capability there.
TLDR: Allow the user to use Agent Mode even for third party Model Provider like Ollama. If the loaded model doesn't support tool calling an error can be surfaced. No need for VS Code to check for tool calling capability.
@underlines We are getting the capabilities from the endpoint. And it does work, I just find many of the models don't like to call tools very much
Here's it failing
Here's it succeeding
@lramos15 Thanks for answering.
We are getting the capabilities from the endpoint.Does this mean, GitHub Copilot currently fetches capabilities and only exposes models with tool calling capabilities to be shown in the drop down, when Agent Mode is selected? Or do we need to be on insider build to have this? Stable doesn't show any ollama fetched models for Agent Mode.- Insider build would probably allow us to use the "github.copilot.chat.byok.ollamaEndpoint" setting as well, right?
- Does GitHub Copilot properly set a larger context window using
"options": {"num_ctx": 32000}? The context size highly affects VRAM usage and should probably be user-editable. If GitHub Copilot uses the standard value by not setting it, it would be 4000, which would mean the agent starts hallucinating by losing initial context pretty fast. Agent-Workflows need at least 16k context.
For your tool calling issues: 4. What Qwen2.5-Coder model did you load? Anything below 24B Parameters will be miserable for Tool Calling from my experience. 5. According to the Function Calling leaderboard, Mistral Small is one of the top open models to perform well on Tool Calling.
- Insiders, but everything that goes into insiders comes to stable the following month
- Insiders as well
- We do not, this is something that you will need to do before running Ollama serve / in the model file, this is because we're using the Open AI compatible Ollama api to request chat completions and this doesn't support num_ctx. https://www.ollama.com/blog/openai-compatibility
- This was 32b I believe
- I've tried that one as well without a ton of luck, I currently am using an RTX 3090. Maybe I don't have enough Vram.
@lramos15 Thanks for answering my previous questions—really helpful!
- I installed the latest Windows Insiders Build, checked for updates, added one Ollama model and one OpenRouter model. Neither model shows up when switching to Agent Mode. What am I doing wrong?
- Context Window:
I'm still unclear on how to persistently set
num_ctxfor specific local models in GitHub Copilot using ollama. As far as I understand, the current options are:
-
System variable:
OLLAMA_CONTEXT_LENGTH=16000sets it globally, but it’s inflexible—can’t be changed per model or at runtime. -
CLI ollama run: Using
/set num_ctx 16000works but needs to be run before every model load in Copilot, which is manual and unintuitive. -
Modelfile: Creating a new Modelfile with
PARAMETER num_ctxfrom the original model, seems cumbersome and not ideal for quick changes. -
API: Sending
"options": {"num_ctx": 16000}in API requests works well and is how tools like Open WebUI handle it. It looks likenum_ctxshould be handled by the calling application, such as GitHub Copilot via the options, also storing this in VS Code setting to have persistent model options.
Given these limitations, is there a better way to define per-model num_ctx settings when using ollama locally with Copilot?
I think you've listed all the possible options. I personally use OLLAMA_CONTEXT_LENGTH, and only load one model at a time as I don't have enough VRAM to load multiple.
We cannot do num_ctx due to the aformentioned Open AI compatible api surface. We would need to switch to the Ollama specific API which can be a bit challenging given that so many of our bring your own key providers are built on top of Open AI compatible specs.
@lramos15
Sorry, I ninja edited my previous answer:
- I installed the latest Windows Insiders Build, checked for updates, added one Ollama model and one OpenRouter model. Neither model shows up when switching to Agent Mode. What am I doing wrong?
- Added OpenRouter models
- Shows up in Edit mode
- Switch to Agent Mode, then doesn't show any Ollama or OpenRouter models:
99% of the free Open Router models don't support tool calls so agent mode cannot work without tool calling. But as you see Gemini 2.5 Pro experimental (free) is there.
Oh, I get it now, you said it in the beginning: GitHub Copilot actually respects the capabilities and only allows you to use models with properly set capabilities. If we want to try models that didn't set set the tool calling capability, we would have to recreate an ollama modelfile with this set. Right?
@underlines Yes, but you'll likely have a bad experience if the model doesn't actually support tools.
I stumbled upon the same issue and coded up a little proxy script that logs the requests and changes "max_tokens":4096 to "options":{"num_predict":4096,"num_ctx":70000} on the fly for the v1/chat/completions endpoint. That helps with ollama.
https://github.com/michael-heinrich/llm_proxy/blob/main/ollama_proxy.py
In my case, Gemini 2.5 Flash Preview doesn't show in agent mode on VS Code Insider. It is shown in Ask and Edit mode. Do the same thing on the stable release, the model is shown in agent mode.
When adding a custom model to a VS Code stable release, it asks whether the model supports tool calling. I don't see this option when adding a custom model to VS Code Insider.
In summary, the experience could be improved. I can think of the following improvements:
- When adding custom LLM Providers / Models, ask the user if the model supports Tool Calling (There are models supporting it, without reporting it in their capabilities)
- Since @lramos15 mentions the projects wants to keep the generic OpenAI compatible API and not the Ollama spec endpoint, I suggest renaming the custom provider from "ollama" to "OpenAI compatible" would clear things up.
- Allow users to set a custom endpoint host and then everybody can use their favorite inference engine (vLLM, llama.cpp server, Ollama, etc.)
Relevant: https://github.com/microsoft/vscode-copilot-release/issues/7289
How large context window copilot needs to operate effectively as an agent? I noticed that it run in loops when I forgot to set it to 16k. I'd like to know if it works better with larger context window or is there any required minimal size?
Now that agentics is becoming more common in open source models such as Qwen 3, this would be a prudent capability to revisit adding. Let's enable agent mode for ollama based model hosting.
+ 1
+1
A lot of the +1 comments should realize that this already works, it depends on the provider advertising us tool calling capabilties.
@lramos15 Based on what I'm seeing, trying dozena of Ollama models in github copilot in VS code, and none of them have agent available. Here for example, this is explicitly an agentic coding model but when loaded into ollama and vscode's github copilot then it only allows ask or edit not agent mode: https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
@lramos15 Based on what I'm seeing, trying dozena of Ollama models in github copilot in VS code, and none of them have agent available. Here for example, this is explicitly an agentic coding model but when loaded into ollama and vscode's github copilot then it only allows ask or edit not agent mode: https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
I was able to use qwen3:8b (500a1f067a9f) with copilot version: 1.350.0 and vscode 1.102.3. But it displays a lot of thinking (first
We really need this! Qwen3-coder is extremely fast, with a token rate of 2,000 tokens per second. It also has similar code quality to Claude 4 Sonnet.
any reason why Ollama models not supported yet by the agent in vscode? i can use Ollama only for ask mode and not for agent
Closing as this does in fact work. Your model must advertise tool calling capabilities. Ollama also often doesn't launch with a large enough context window so make sure you're launching it with the appropriate env variables to increase the context window.
I recommend using Openrouter it appears to work the best in agent mode for me.
Closing as this does in fact work. Your model must advertise tool calling capabilities. Ollama also often doesn't launch with a large enough context window so make sure you're launching it with the appropriate env variables to increase the context window.
I recommend using Openrouter it appears to work the best in agent mode for me.
Do you know which public models that are known to work for Agent mode? For copilot
@prusswan I've tested that "DeepSeek: R1 0528" (deepseek/deepseek-r1-0528) in OpenRouter works well in Agent mode. But I've found a gotcha: you must explicitly tell it to edit files, otherwise it might output in chat panel instead of calling tools. Example:
#codebase For grammar strings that have indent in them (e.g. the one in tests\v2\test_grammar_validator.py:test_rule_with_no_collision), and are passed to (parser class).from_text, edit the relevant files to use textwrap.dedent for the grammar string, in the form of
from textwrap import dedent.
I haven't tested other models.
Hello, I’ve gone through all the messages posted (I came across the first mention about setting a custom URL for Ollama).
Regarding the compatibility of your model with agent mode, I’d like to suggest a solution. Basically, when selecting models in the model manager and going to the Ollama section, it would be helpful if VS Code displayed all models, but grayed out the ones that aren’t compatible with agent mode, or added a small disclaimer to warn users. Ideally, it could also redirect them to a documentation page explaining this in more detail.
I also think that perhaps we should open the issue which, in my opinion, is not finished.
thanks in advance.