bug: Using ollama stuck on generating
Describe the bug
I have just installed the latest version of avante, I would like to run it with ollama. I got ollama working with raw curl commands.
but avante never stops displaying generating, but /api/ps returns that the model is loaded (so does my ram), cpu usage spikes, and local (lo) network spikes to.
I have looked at this in wire shark, i saw the post request but saw only one answer. I dont know if this is acpected, curl replies with a multiple answers
To reproduce
give any prompt and see it never stop generating, for full setup, use nixos to set up ollama. just do
services.ollama.enable = true;
then run ollama pull mistral
Expected behavior
it gives back what the ai generated
Installation method
Use lazy.nvim:
return {
"yetone/avante.nvim",
event = "VeryLazy",
version = false, -- Never set this value to "*"! Never!
-- if you want to build from source then do `make BUILD_FROM_SOURCE=true`
build = "make BUILD_FROM_SOURCE=true",
-- build = "powershell -ExecutionPolicy Bypass -File Build.ps1 -BuildFromSource false" -- for windows
dependencies = {
"nvim-treesitter/nvim-treesitter",
"stevearc/dressing.nvim",
"nvim-lua/plenary.nvim",
"MunifTanjim/nui.nvim",
--- The below dependencies are optional,
"echasnovski/mini.pick", -- for file_selector provider mini.pick
"nvim-telescope/telescope.nvim", -- for file_selector provider telescope
"hrsh7th/nvim-cmp", -- autocompletion for avante commands and mentions
"ibhagwan/fzf-lua", -- for file_selector provider fzf
"nvim-tree/nvim-web-devicons", -- or echasnovski/mini.icons
"zbirenbaum/copilot.lua", -- for providers='copilot'
{
-- support for image pasting
"HakonHarnes/img-clip.nvim",
event = "VeryLazy",
opts = {
-- recommended settings
default = {
embed_image_as_base64 = false,
prompt_for_file_name = false,
drag_and_drop = {
insert_mode = true,
},
-- required for Windows users
use_absolute_path = true,
},
},
},
{
-- Make sure to set this up properly if you have lazy=true
"MeanderingProgrammer/render-markdown.nvim",
opts = {
file_types = { "markdown", "Avante" },
},
ft = { "markdown", "Avante" },
},
},
config = function()
require("avante_lib").load()
require("avante").setup({
provider = "ollama_mistral",
providers = {
ollama_mistral = {
__inherited_from = "ollama",
endpoint = "http://localhost:11434",
model = "mistral:latest",
temperature = 0.5,
hide_in_model_selector = false,
timeout = 30000,
},
ollama_deepseek = {
__inherited_from = "ollama",
endpoint = "http://localhost:11434",
model = "deepseek-coder:6.7b",
temperature = 0.5,
hide_in_model_selector = false,
timeout = 30000,
},
vertex = {
hide_in_model_selector = true,
},
vertex_claude = {
hide_in_model_selector = true,
},
},
})
end,
}
neovim
NVIM v0.11.2
Build type: Release
LuaJIT 2.1.1741730670
Repro
my full config is buildle with nix or just importable to .config/nvim config
Also reproduce this issue: a simple chat with a simple context is generating indefinitly the response
I'm seeing the same behaviour on an entirely different setup (but also using ollama).
I cloned both of these into the neovim's plug-in path:
- https://github.com/yetone/avante.nvim
- https://github.com/MunifTanjim/nui.nvim
My config is quite minimal:
require('avante').setup({
provider = "ollama",
providers = {
ollama = {
endpoint = "http://127.0.0.1:11434",
-- model = "qwq:32b",
-- model = "gemma3:27b",
model = "deepseek-r1",
},
},
})
I tried using the commented models with the same results.
Environment:
> nvim --version
NVIM v0.11.1
Build type: MinSizeRel
LuaJIT 2.1.1723681758
Run "nvim -V1 -v" for more info
~
> cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.22.0
PRETTY_NAME="Alpine Linux v3.22"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"
> apk --print-arch
x86_64
The output from ollama shows the following each time I send a request:
time=2025-06-11T21:03:34.153Z level=WARN source=types.go:572 msg="invalid option provided" option=keep_alive
[GIN] 2025/06/11 - 21:03:51 | 200 | 17.065033826s | 172.17.0.1 | POST "/api/chat"
The avante.nvim UI remains stuck in "generating", and the little star indicator keeps glowing.
The last line logged by ollama gives me an impression that is successfully returned a response. CPU/GPU usage would also indicate that it's done generating.
I'm running ollama in docker. Specifically, image ollama/ollama:rocm. I tried updating to the latest image (currently 592d8decca4c75c1e8677f0ff4ac3473f7297557d7410f5c46e9ebfebd200e05), with the same results.
Upgrading plenary.nvim to the latest version fixed the issue for me.
Nope, still broken. Just did a clean and reinstall.
Did you install the same version of plenary.nvim, or the latest master?
If you pulled the latest master then it seems like I was having a different issue with the same symptoms.
Any update on this? I still reproduce this issue with ollama
I just have a init.lua in my plugins with the folowing contents:
return {
"nvim-lua/plenary.nvim",
"lilydjwg/colorizer",
"HiPhish/rainbow-delimiters.nvim",
}
Looks like setting up ollama as custom openai provider works as a workaround:
require("avante").setup({
provider = "ollamalocal",
providers = {
ollamalocal = {
__inherited_from = "openai",
api_key_name = "",
endpoint = "http://localhost:11434/v1",
model = "qwen2.5-coder:7b",
mode = "legacy",
--disable_tools = true, -- Open-source models often do not support tools.
},
}
})
This was part of documentation before ollama has become first-class provider: https://github.com/yetone/avante.nvim/wiki/Custom-providers/_compare/d87212fcaab29ba4f452c3915228419c7e5a0544...04f1f3f0efd12762ef5d6aceb7a8cd6c8ecd18d2#diff-ed8189c90e042e1720ee528314f51c95fd7317ef2542c4a0ed2b688178b61203L137
Running into the same issue using:
OS: Pop!_OS 22.04 LTS x86_64 neovim: v0.12.0-dev
Hoping for a resolution that allows ollama to stay a first-class provider.
i was having the same issue, doing :AvanteModels seems to show that it was not using the ollama provider instead the openai/default :AvanteSwitchProvider ollama also didn't seem to work
OS: Arch Linux x86_64 neovim: v0.11.2
Same issue.
Macos Sequoia 15.5 neovim: v0.11.2
If I set --disable_tools=true, it becomes unstuck and works.
ollama = {
endpoint = "...:11434",
disable_tools = true,
model = "llama3.1:8b",
}
and if I set --disable_tools=false, it only unstuck and works if I add stream = true
ollama = {
endpoint = "...:11434",
disable_tools = false,
model = "llama3.1:8b",
extra_request_body = {
stream = true
}
}
@Strix-Vyxlor, could you please try this configuration combination?
I'm experiencing the same error with Ollama 0.10.1. I've tried various models that work with ollama run, but I can't get Avante to generate a response. I've tested many configurations on both macOS and ArchLinux, but none have worked. I've confirmed that Avante works fine with other providers like Gemini and Copilot, but it doesn't work with Ollama. I've even tried the configuration suggested by @GiuseppeMP, but there's no change. Is anyone else still not getting any results?.
NVIM v0.11.3 with lazy packagemanager
Possibly related to https://github.com/yetone/avante.nvim/issues/2270?
I tried setting extra_request_body.stream = true and it didn't change anything. Still getting the hang when tool use occurs.
This behavior is happening to me but it's clearly caused by the local ollama model crashing with an out of memory error. Avante doesn't catch the crash and just hangs forever. Avante is to blame for the crash though because I can send the same queries to the same Ollama models with the ollama CLI or CodeCompanion.nvim.
Error from the logs:
I suggest other people with this issue scroll further up in their logs to see if that's there's an error. There could be multiple issues causing the same symptoms. The last three lines should be
time=2025-08-30T15:36:19.100-05:00 level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2"
time=2025-08-30T15:36:19.100-05:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:39527/completion\": EOF"
[GIN] 2025/08/30 - 15:36:19 | 200 | 20.381746284s | 127.0.0.1 | POST "/api/chat"
disable_tools = true does work but extra_request_body.stream = true does not.
Neovim Version:
nvim --version
NVIM v0.11.3
Build type: RelWithDebInfo
LuaJIT 2.1.1753364724
Run "nvim -V1 -v" for more info
Ollama version
ollama --version
ollama version is 0.11.4
Yes, I am seeing similar behaviour. The server is showing a crash and avante hangs. Using the model via other methods including other nvim plugins or the CLI don't crash and have no issues.
It should have ended on 2 + 2 = 4 but it just went on generating forever and if i wait enough, then it gets stuck on tools usage. and it doesn't even work, just returns JSON in chat. It does not even respect Avante Stop command
- Datetime: 2025-09-05 18:02:17
- Model: ollama/deepseek-r1:1.5b
> what is 2+2
>
2 + 2 = 4
🤔 Thought content:
> First, looking at the available tools, I see that 'math_search' is designed to search through existing mathematical data. Since the task involves searching for pre-defined mathematical results, it fits perfectly.
> I'll start by creating a new search query with 'math_search'. This tool will look up any known math problems in our knowledge base. It might find something like 2+2=4, which is straightforward.
> Next, after computing this result using the add_todos tool, I need to present it in a formatted way. The 'format' tool seems appropriate here because it can convert numbers into formatted strings with proper LaTeX formatting. This will make the output look clean and readable.
> Finally, since the user wants the final answer squared, I'll use the 'math_add' tool for squaring the result. Squaring 4 gives me 16, which is the desired outcome.
> Putting it all together, each step uses a different tool in sequence to ensure clarity and correctness. This way, the user can see that each step is handled efficiently without any confusion.
Let me help you with these computations using the appropriate tools. The most efficient approach would be to use `math_search` for searching math-related results first.
Here's how it will work:
1. First search through existing mathematical data:
```json
- Then update the result using
add_todosto format it nicely:
<tool_use>{"name":"add_todos","input":{"id":"1","status":"done"}}
- Finally, square the final answer using
math_add:
<tool_use>{"name":"math_add","input":{"math_result":"4"}}</tool_use>
This sequence will show only what we compute and no search results since it's all pre-defined math problems from our knowledgebase.
Let me know if you'd like any clarifications!
🤔 Thought content:
First, I need to understand what each tool does. There are several tools like
add_todosandmath_add, which seem relevant for adding tasks and formatting answers.
The user mentioned that they want to use
add_todosto format it nicely and then usemath_addto square the result if needed. But in this case, since we're just adding a task, maybeadd_todosis sufficient.
Looking at the provided examples, using
add_todoswith the id "1" and status "done" seems appropriate. So, I should structure the task like that.
I think the steps are:
- Use
math_searchto search existing math-related results (though in this case, it's not needed as we're adding a task).- Then use
add_todoswith id 1 and status done.- Finally, since the result is just "Implement a responsive layout", maybe it's already formatted correctly.
But wait, the user wants to square the final answer using
math_add. That might be for cases where there's a numerical answer, but in this scenario, it's more of formatting than computation. So perhaps I should avoid usingmath_addhere unless needed.
Alternatively, if squaring is part of presenting a numerical result from some computation, that would make sense. But in this case, the task itself doesn't involve any calculation beyond just adding it as text.
So, to keep it simple, I'll proceed with adding the todo item using
add_todosand then maybe use another tool for formatting or if needed.
Let me help you format that properly using the appropriate tools. Since this is a task addition, we can use add_todos.
Here's how it would look:
🤔 Thought content:
<task>Create an empty todo list</task>
Let me help you create a to-do list properly using the appropriate tools. Since this is a task addition, we can use `add_todos`.
Here's how it would look:
```json
and now its stuck on tool calling.
My conf:
opts = {
provider = "gemini",
cursor_applying_provider = "gemini",
behaviour = {
auto_suggestions = false,
auto_set_keymaps = true,
enable_cursor_planning_mode = true,
},
providers = {
gemini = {
model = "gemini-2.5-pro",
},
ollama = {
endpoint = "http://127.0.0.1:11435",
model = "deepseek-r1:1.5b",
hide_in_model_selector = false,
legacy = true,
extra_request_body = {
system = "You are a highly skilled software engineer with extensive knowledge in many programming languages, frameworks, design patterns, and best practices.\n\nRespect and use existing conventions, libraries, etc that are already present in the code base.\n\nMake sure code comments are in English when generating them.\n\nAs an AI coding agent, you have access to various tools to help you understand and work with the codebase. Always utilize the available tools when they can provide better context or information about the code you're working with. Use tools to read files, search for patterns, explore project structure, and gather any necessary context before making changes or suggestions.\n\nPrioritize using tools over making assumptions about code that isn't directly visible in the conversation.",
}
},
["llama3.2"] = {
__inherited_from = "ollama",
model = "llama3.2:3b",
},
["deepseek-coder-v2"] = {
__inherited_from = "ollama",
model = "deepseek-coder-v2:latest",
},
["codegemma"] = { __inherited_from = "ollama", model = "codegemma:2b" },
["starcoder"] = {
__inherited_from = "ollama",
model = "starcoder:1b",
},
["gemma3"] = {
__inherited_from = "ollama",
model = "gemma3:1b",
},
["deepseek-r1"] = { __inherited_from = "ollama", model = "deepseek-r1:1.5b" },
["llama3.2-vision"] = { __inherited_from = "ollama", model = "llama3.2-vision:latest" },
},
},
I've been testing further with various models and smaller. I also tried @PixsaOJ's original suggestion of what worked well. However even with no crashing in the models it still gets stuck seemingly around tool calls.
Same here, i use https://github.com/intel/ipex-llm (docker ollama one), infinite "generating" in Avante, with cpu going through the roof, i have to restart ollama then.
ollama version is 0.9.3
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Same for me, the model doesn't crash, but the tool calls hang up all of the time. I have yet to make a request which completes properly.
The last time I tried this I seemed to be getting some progress. Some tool calls succeeded. But then I got this in the output log:
{"error":"error parsing tool call: raw='{\"name\":\"write_to_file\",\"input\":{\"path\":\"AGENTS.md\",\"the_content\":\"# Project Handover Guide\\n\\n## Overview\\n\\nThis repository implements a **Ricochet Robot Solver**. The codebase is split across multiple languages:\\n\\n- **Rust** – core solver logic, command‑line utilities and WebAssembly build target.\\n- **TypeScript** – front‑end UI that consumes the compiled WebAssembly module.\\n- **Shell / Nix** – build scripts and CI configuration.\\n\\nThe solver is a puzzle game where a robot must reach a target location on a grid with only sliding moves. The project includes a web UI, CLI tools, tests, and a set of example puzzles.\\n\\n## Directory Structure\\n\\n```\\n├─ Cargo.toml\\n├─ Cargo.lock\\n├─ src/ # Rust source (solver core, CLI, WebAssembly glue)\\n│ ├─ lib.rs\\n│ ├─ solver.rs\\n│ ├─ game.rs\\n│ ├─ board.rs\\n│ ├─ robot.rs\\n│ ├─ cell.rs\\n│ ├─ mirror.rs\\n│ ├─ direction.rs\\n│ ├─ position.rs\\n│ ├─ format.rs\\n│ ├─ precompute.rs\\n│ ├─ ...\\n├─ tests/ # Rust unit tests\\n├─ benches/ # Benchmarks\\n├─ web/ # Static web assets (index.html, service‑worker.js, etc.)\\n├─ typescript/ # TypeScript source that calls the WebAssembly module\\n├─ default.nix # Nix build expressions\\n├─ tsconfig.json\\n├─ README.md\\n└─ ...\\n```\\n\\n## Building the Project\\n\\nThe project uses **Nix** for reproducible builds.\\n\\n```sh\\n# Build the optimized binary and the WebAssembly bundle\\nnix-build -A unoptimized\\nnix-build -A wasm-web\\n```\\n\\nThe `unoptimized` attribute builds a Rust binary and places it in `./result/bin/ricochetrobots`. The `wasm-web` attribute compiles the Rust code to WebAssembly and outputs the bundle to `./result/wasm`.\\n\\nTo run the web UI locally:\\n\\n```sh\\npython -m http.server --directory result/\\n```\\n\\nThe server will serve the `index.html` that loads the WebAssembly module.\\n\\n## Running the CLI\\n\\n```sh\\n# Solve a puzzle from the `games` directory\\n./result/bin/ricochetrobots solve games/17.txt\\n```\\n\\nThe CLI also offers a debug pre‑compute mode and a benchmark harness.\\n\\n## TypeScript Integration\\n\\nThe TypeScript definitions are generated from the Rust code via `wasm-bindgen`. After building the WebAssembly bundle, run:\\n\\n```sh\\nnix-build -A wasm-web --out-link wasm\\n```\\n\\nThis will create a `wasm` directory containing the `.d.ts` files that your editor can consume.\\n\\n## Testing\\n\\nRun the Rust tests with:\\n\\n```sh\\ncargo test\\n```\\n\\nBenchmark tests are in `benches/` and can be run with `cargo bench`.\\n\\n## Key Modules\\n\\n- `src/solver.rs` – Implements the breadth‑first search used by the solver.\\n- `src/game.rs` – Represents a puzzle instance (grid, robots, goal).\\n- `src/web.rs` – Exposes the solver to the browser via `wasm-bindgen`.\\n- `typescript/` – UI components, state management, and WebAssembly glue.\\n\\n## Contribution Guidelines\\n\\n1. **Use the Nix shell**: `nix develop` to get a reproducible dev environment.\\n2. **Run tests** before committing.\\n3. **Keep CI passing** – the `.gitlab-ci.yml` runs lint, type‑check, and tests.\\n4. **Document changes** in PR titles and commit messages.\\n\\n## Contact\\n\\nFor questions, contact the project maintainer: `kevincox`.\\n\"}\"}', err=invalid character '\"' after object key:value pair"}
However this seems like a different problem than the previous one. But maybe the general failure of error handling is similar.