Why is opencode not working with local llms via Ollama?
Question
Why is opencode not working with local llms via Ollama? Hello. I have tried numerous local llms with opencode and I can not seem to get any to work. I have a decent PC that can run up to a 30b model smoothly. I have tried them. I can not get anything to work. Below is an example of what keeps happening. This is with llama3.2:3b.
Any help is appreciated.
EDIT: Added my config.
Hm I don't think the llama models will treat you very well, also what is your context size for ollama
Hm I don't think the llama models will treat you very well
- What would you recommmend?
also what is your context size for ollama
- Right now it is set at 32k. Is this too small?
I think that it's viewing OpenCode as the context instead of the actual project directory.
I think that it's viewing OpenCode as the context instead of the actual project directory.
Exactly what I was thinking as well. Seems weird.
What you are experiencing is actually common with local models. This is classic tool calling capability gap. I use Anthropic most of the time, which has native, robust tool calling support.
But I also do a lot of local model stuff. Models like qwen3-coder:30b don't have it. I can see you're using Llama3.2:3b which is an incredibly small model which is likely also the problem.
Some of the guys at unsloth have been working on this issue: https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/discussions/10
To provide you guys with context here:
Models must be specifically fine-tuned to understand structured tool calling
- They gotta learn to output valid JSON in a specific format when tools are available
- The HuggingFace discussion on Qwen3-Coder states it struggles because:
- It outputs XML-style tags (<tool_call>, <function=...>) instead of JSON
- It hallucinates tool names (like todonext) that don't exist
- It doesn't consistently follow the expected format
I've been hinking about this - and there might be a PR here in theory. I believe LM Studio has a transformation layer that intercepts the models raw output, spots xml tool calls and changes them to json format.
hmm....
Looking at /packages/opencode/src/provider/transform.ts, in theory one could add a new transformation for Ollama/local models and Parse XML-style outputs and convert to JSON then hook into the wrapLanguageModel from AI SDK
maybe... I pulled this out my rear
Yeah I think its possible technically
I started fiddling with this because I have ADHD and am now hyper fixated. This is def a challenge. I think however the model might still be the challenge. TBD
But I also do a lot of local model stuff. Models like qwen3-coder:30b don't have it. I can see you're using Llama3.2:3b which is an incredibly small model which is likely also the problem.
The issue isn't solely due to tiny models like llama3.2 This is an example with gpt-oss:20b
But I also do a lot of local model stuff. Models like qwen3-coder:30b don't have it. I can see you're using Llama3.2:3b which is an incredibly small model which is likely also the problem.
The issue isn't solely due to tiny models like llama3.2 This is an example with gpt-oss:20b
![]()
Yeah it's the models ability to call the correct tools and use them. I've been testing with Qwen3-coder:30B for an hour now running with an idea. It hallucinates tool names nonstop. So an even smaller model will do even worse.
I built an adapter that catches when Qwen+Ollama mess up tool names, uses fuzzy matching to figure out what they actually meant (like mapping exec to bash), and fixes the call before it fails.
That was the easy part .... but it's still 💩
The problem is I get inconsistent behavior out the wazoo.
- Sometimes outputs XML as plain text:
<function=files></function></tool_call> - Sometimes makes actual tool calls (but with wrong arguments)
- Sometimes just describes what it would do instead of doing it :|
TLDR: ✅ Fuzzy matching works ✅ Alias mapping works ❌ Model still can't edit files ❌ Model still can't run commands ❌ User experience is broken
I gotta think on this one. Maybe test more models. It might just be that the model itself is the limitation. Claude works great though. I love this TUI. I just discovered this whole project randomly earlier today when browsing Reddit about Ollama.
I have heard most people have better experiences with lmstudio over ollama
There isn't a ton we can do on our end besides maybe extra prompting for local models?
Just tested both Ollama and LM Studio on Opencode with multiple models the past couple days.
After many issues with Ollama (mostly that all models default to a very small context window and you have to modify them or find versions with bigger context window settings, and tool call formatting issues), after installing LM Studio I was able to consistently use qwen/qwen3-30b-a3b-2507 with tools, and had varied success with openai/gpt-oss-20b, but it was still usable for shorter tasks.
TL;DR: ditch Ollama, try LM Studio
I was able to verify today that the MODEL (qwen3-coder:30b-a3b-q4_K_M) actually is making tool calls. The disconnect was happening between Ollama and the time it hit the AI SDK. Pretty sure Ollama's parser layer is the problem.
By the time it got to the SDK it was plain text.
// What Ollama returns:
{
"content": "<function=read>\n<parameter=file_path>/etc/hostname</parameter>"
}
So need to intercept that at the middleware layer BEFORE it hits the AI SDK. But there might be another issue... TBD
I'll come back to this if I figure something out
Ollama has 2 API surfaces- a "native" one that most ollama clients seem to use, and a claimed openai compatible API surface (the /v1 prefixed endpoints). From dev work I have done against them, the openai endpoints are not entirely openai compatible, some subtle issues both on normal streaming calls and on tool use. I see that opencode suggests treating ollama as openai-compatible- well, and their doc references llama 2, which is ancient- not sure what to say about that. I would think acting as an ollama client via openai endpoints isn't really going to work.
Ollama requires extra configuration to be able to use larger than 8k context sizes. Either the model has to know itself, or the server has to be told at startup via environment variables or whatever mechanism.
llama3.2:3b is really not very capable, I would not expect it to actually be able to do tool calls. I would only use for very simple small classification tasks. qwen3-coder with full context should be fine for tool and code analysis and small generation tasks.
I had been using cline against ollama using the qwen3-coder models (30b) very successfully, but for reasons switched to llama.cpp. cline seems to have context management issues when not talking to ollama or to a foundation model, so I was looking for a replacement. in limited testing so far opencode is working well using qwen3-coder 30b against llama.cpp, able to do all the range of tool use tasks I would expect.
Yeah - I was able to get further down the rabbit hole if I turned off streaming.... this is just.... not the experience I imagine the dev team wants in the app. There is probably a solution here somewhere... but it's def one that is going to take a touch longer than vibe coding a solution in an hour. That ain't happening.
Still props to the dev team. I used this a bunch today with Claude and it might be my new favorite interface. The keybindings are great, config is easy to setup - you can tell it was crafted with love by NVIM users.
Thanks everyone for the help and for looking into it. I will definitely switch models.
I completely agree. This tool is awesome. I will be using it for a while I suspect. Unless anyone has anything else I'll probably close this out as done tomorrow. Thanks again!
We can prolly ask the ollama team what they'd recommend
i pretty much had the exact same experience not only with opencode but also with void. Using ollama with litellm in between.
Next i am going to test vllm because it seems to be way more efficient as well. Does anyone already has experience with it?
I saw this being asked a few times in the LocalLLM on Reddit... realizing many are still trying. I'll have to share what my attempt was and maybe some of us can work up a solution to put into a PR.
I wish the opencode docs would contain not only recommended cloud models but also a list of open source models that work well (or at least are somewhat functional, if there are any). Together with required config changes or workarounds like increasing the context window size. Preferably not just for those with multiple 5090 but also for those limited to different sizes: 3b, 8b or 30b.
So far I've only managed to get a single 30b local model to emit valid tooling commands (after lots of searching, trying and getting annoyed at the current situation+docs).
I was able to verify today that the MODEL (qwen3-coder:30b-a3b-q4_K_M) actually is making tool calls. The disconnect was happening between Ollama and the time it hit the AI SDK. Pretty sure Ollama's parser layer is the problem.
By the time it got to the SDK it was plain text.
// What Ollama returns: { "content": "<function=read>\n<parameter=file_path>/etc/hostname</parameter>" }So need to intercept that at the middleware layer BEFORE it hits the AI SDK. But there might be another issue... TBD
I'll come back to this if I figure something out
Hey yeah I saw qwen3-coder:30b making call tho. I tried gpt-oss:latest which is 20b model, it fails sometimes when making tool calls, its interesting. I have another problem maybe you have encountered. My model calls grep calls directly at home folder. What this happens any idea?
(I run the opencode in repo folder and my repo folders in external drive as you can see right bottom side in the image)
NOTE: It works fine with cloud model providers (claude, zen, openai etc.), they use grep or other commands in repo folder.