avante.nvim bug: doesn't work with ollama models

Describe the bug

I ran into a few issues that made it basically unusable:

When I tried it with my local homelab Ollama models (deepseek-r1:14b, qwen2.5-coder:14b, gemma3:12b), I couldn’t get any proper results. On top of that, the UI always got stuck on “generating” or “tool calling,” and the only way to recover was to quit Neovim. Any hotkeys to cancel the request just didn’t work.

Then I decided to give it a try on a Vast.ai machine with 2×3090 GPUs. I tested llama3.1:70b and devstral:24b — but the experience was exactly the same.

Config like below:

    providers = {
      ollama_llama = {
        __inherited_from = "ollama",
        model = "llama3.1:70b",
        endpoint = vim.env.OLLAMA_URL,
        timeout = 90000,
        extra_request_body = {
          options = {
            num_ctx = 8192,
            temperature = 0.1,
          },
        },
      },
    },

To reproduce

No response

Expected behavior

No response

Installation method

Use lazy.nvim:

{
  "yetone/avante.nvim",
  event = "VeryLazy",
  lazy = false,
  version = false, -- set this if you want to always pull the latest change
  opts = {
    -- add any opts here
  },
  -- if you want to build from source then do `make BUILD_FROM_SOURCE=true`
  build = "make",
  -- build = "powershell -ExecutionPolicy Bypass -File Build.ps1 -BuildFromSource false" -- for windows
  dependencies = {
    "nvim-lua/plenary.nvim",
    "MunifTanjim/nui.nvim",
  },
}

Environment

nvim: v0.11.5 avante.nvim: { "branch": "main", "commit": "44b594863c1abf72690ae82651fb70c0b9adeeaa" } OS: arch linux

Repro

vim.env.LAZY_STDPATH = ".repro"
load(vim.fn.system("curl -s https://raw.githubusercontent.com/folke/lazy.nvim/main/bootstrap.lua"))()

require("lazy.minit").repro({
  spec = {
    -- add any other plugins here
  },
})

Nov 10 '25 12:11 A

Same on a MacBook Pro M3 Max (48GB unified RAM). NVIM v0.11.3 avante.nvim git commit ca95e0386433da2077184719886fa658257261a3 OS: MacOS 15.6

Nov 17 '25 03:11 alexhochreiter

I found that when running ollama ps, I was getting the model timing out. setting $env.OLLAMA_KEEP_ALIVE = "30m" in nushells config fixed it.

nevermind, that seems inconsistent for some reason...

Nov 23 '25 20:11 jamesonBradfield

I think, the reason, is the client sends num_ctx in a wrong manner, therefore ollama defaults to 4k. I set default context in OLLAMA_CONTEXT_LENGTH env var of the ollama systemd service, and it fixed the issue.

Anyway, i haven't been able to achieve anything good via 30b models, except qwen:32b, and give up on ollama.

Nov 25 '25 19:11 A

Also creating a separate model file seems to help.

Dec 05 '25 19:12 jamesonBradfield