bug: doesn't work with ollama models
Describe the bug
I ran into a few issues that made it basically unusable:
When I tried it with my local homelab Ollama models (deepseek-r1:14b, qwen2.5-coder:14b, gemma3:12b), I couldn’t get any proper results. On top of that, the UI always got stuck on “generating” or “tool calling,” and the only way to recover was to quit Neovim. Any hotkeys to cancel the request just didn’t work.
Then I decided to give it a try on a Vast.ai machine with 2×3090 GPUs. I tested llama3.1:70b and devstral:24b — but the experience was exactly the same.
Config like below:
providers = {
ollama_llama = {
__inherited_from = "ollama",
model = "llama3.1:70b",
endpoint = vim.env.OLLAMA_URL,
timeout = 90000,
extra_request_body = {
options = {
num_ctx = 8192,
temperature = 0.1,
},
},
},
},
To reproduce
No response
Expected behavior
No response
Installation method
Use lazy.nvim:
{
"yetone/avante.nvim",
event = "VeryLazy",
lazy = false,
version = false, -- set this if you want to always pull the latest change
opts = {
-- add any opts here
},
-- if you want to build from source then do `make BUILD_FROM_SOURCE=true`
build = "make",
-- build = "powershell -ExecutionPolicy Bypass -File Build.ps1 -BuildFromSource false" -- for windows
dependencies = {
"nvim-lua/plenary.nvim",
"MunifTanjim/nui.nvim",
},
}
Environment
nvim: v0.11.5 avante.nvim: { "branch": "main", "commit": "44b594863c1abf72690ae82651fb70c0b9adeeaa" } OS: arch linux
Repro
vim.env.LAZY_STDPATH = ".repro"
load(vim.fn.system("curl -s https://raw.githubusercontent.com/folke/lazy.nvim/main/bootstrap.lua"))()
require("lazy.minit").repro({
spec = {
-- add any other plugins here
},
})
Same on a MacBook Pro M3 Max (48GB unified RAM). NVIM v0.11.3 avante.nvim git commit ca95e0386433da2077184719886fa658257261a3 OS: MacOS 15.6
I found that when running ollama ps, I was getting the model timing out. setting $env.OLLAMA_KEEP_ALIVE = "30m" in nushells config fixed it.
nevermind, that seems inconsistent for some reason...
I think, the reason, is the client sends num_ctx in a wrong manner, therefore ollama defaults to 4k. I set default context in OLLAMA_CONTEXT_LENGTH env var of the ollama systemd service, and it fixed the issue.
Anyway, i haven't been able to achieve anything good via 30b models, except qwen:32b, and give up on ollama.
Also creating a separate model file seems to help.