avante.nvim icon indicating copy to clipboard operation
avante.nvim copied to clipboard

bug: Using ollama stuck on generating

Open Strix-Vyxlor opened this issue 6 months ago • 21 comments

Describe the bug

I have just installed the latest version of avante, I would like to run it with ollama. I got ollama working with raw curl commands.

but avante never stops displaying generating, but /api/ps returns that the model is loaded (so does my ram), cpu usage spikes, and local (lo) network spikes to.

I have looked at this in wire shark, i saw the post request but saw only one answer. I dont know if this is acpected, curl replies with a multiple answers

To reproduce

give any prompt and see it never stop generating, for full setup, use nixos to set up ollama. just do

services.ollama.enable = true;

then run ollama pull mistral

Expected behavior

it gives back what the ai generated

Installation method

Use lazy.nvim:

return {
	"yetone/avante.nvim",
	event = "VeryLazy",
	version = false, -- Never set this value to "*"! Never!
	-- if you want to build from source then do `make BUILD_FROM_SOURCE=true`
	build = "make BUILD_FROM_SOURCE=true",
	-- build = "powershell -ExecutionPolicy Bypass -File Build.ps1 -BuildFromSource false" -- for windows
	dependencies = {
		"nvim-treesitter/nvim-treesitter",
		"stevearc/dressing.nvim",
		"nvim-lua/plenary.nvim",
		"MunifTanjim/nui.nvim",
		--- The below dependencies are optional,
		"echasnovski/mini.pick", -- for file_selector provider mini.pick
		"nvim-telescope/telescope.nvim", -- for file_selector provider telescope
		"hrsh7th/nvim-cmp", -- autocompletion for avante commands and mentions
		"ibhagwan/fzf-lua", -- for file_selector provider fzf
		"nvim-tree/nvim-web-devicons", -- or echasnovski/mini.icons
		"zbirenbaum/copilot.lua", -- for providers='copilot'
		{
			-- support for image pasting
			"HakonHarnes/img-clip.nvim",
			event = "VeryLazy",
			opts = {
				-- recommended settings
				default = {
					embed_image_as_base64 = false,
					prompt_for_file_name = false,
					drag_and_drop = {
						insert_mode = true,
					},
					-- required for Windows users
					use_absolute_path = true,
				},
			},
		},
		{
			-- Make sure to set this up properly if you have lazy=true
			"MeanderingProgrammer/render-markdown.nvim",
			opts = {
				file_types = { "markdown", "Avante" },
			},
			ft = { "markdown", "Avante" },
		},
	},
	config = function()
		require("avante_lib").load()
		require("avante").setup({
			provider = "ollama_mistral",
			providers = {
				ollama_mistral = {
					__inherited_from = "ollama",
					endpoint = "http://localhost:11434",
					model = "mistral:latest",
					temperature = 0.5,
					hide_in_model_selector = false,
					timeout = 30000,
				},
				ollama_deepseek = {
					__inherited_from = "ollama",
					endpoint = "http://localhost:11434",
					model = "deepseek-coder:6.7b",
					temperature = 0.5,
					hide_in_model_selector = false,
					timeout = 30000,
				},
				vertex = {
					hide_in_model_selector = true,
				},
				vertex_claude = {
					hide_in_model_selector = true,
				},
			},
		})
	end,
}

neovim

NVIM v0.11.2
Build type: Release
LuaJIT 2.1.1741730670

Repro

my full config is buildle with nix or just importable to .config/nvim config

Strix-Vyxlor avatar Jun 03 '25 13:06 Strix-Vyxlor

Also reproduce this issue: a simple chat with a simple context is generating indefinitly the response

reaper avatar Jun 10 '25 08:06 reaper

I'm seeing the same behaviour on an entirely different setup (but also using ollama).

I cloned both of these into the neovim's plug-in path:

  • https://github.com/yetone/avante.nvim
  • https://github.com/MunifTanjim/nui.nvim

My config is quite minimal:

require('avante').setup({
    provider = "ollama",
    providers = {
      ollama = {
        endpoint = "http://127.0.0.1:11434",
        -- model = "qwq:32b",
        -- model = "gemma3:27b",
        model = "deepseek-r1",
      },
    },
})

I tried using the commented models with the same results.

Environment:

> nvim --version
NVIM v0.11.1
Build type: MinSizeRel
LuaJIT 2.1.1723681758
Run "nvim -V1 -v" for more info
~  
> cat /etc/os-release 
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.22.0
PRETTY_NAME="Alpine Linux v3.22"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"
> apk --print-arch 
x86_64

The output from ollama shows the following each time I send a request:

time=2025-06-11T21:03:34.153Z level=WARN source=types.go:572 msg="invalid option provided" option=keep_alive
[GIN] 2025/06/11 - 21:03:51 | 200 | 17.065033826s |      172.17.0.1 | POST     "/api/chat"

The avante.nvim UI remains stuck in "generating", and the little star indicator keeps glowing.

The last line logged by ollama gives me an impression that is successfully returned a response. CPU/GPU usage would also indicate that it's done generating.

I'm running ollama in docker. Specifically, image ollama/ollama:rocm. I tried updating to the latest image (currently 592d8decca4c75c1e8677f0ff4ac3473f7297557d7410f5c46e9ebfebd200e05), with the same results.

WhyNotHugo avatar Jun 11 '25 21:06 WhyNotHugo

Upgrading plenary.nvim to the latest version fixed the issue for me.

WhyNotHugo avatar Jun 12 '25 01:06 WhyNotHugo

Nope, still broken. Just did a clean and reinstall.

Strix-Vyxlor avatar Jun 15 '25 13:06 Strix-Vyxlor

Did you install the same version of plenary.nvim, or the latest master?

If you pulled the latest master then it seems like I was having a different issue with the same symptoms.

WhyNotHugo avatar Jun 15 '25 14:06 WhyNotHugo

Any update on this? I still reproduce this issue with ollama

reaper avatar Jun 26 '25 14:06 reaper

I just have a init.lua in my plugins with the folowing contents:

return {
	"nvim-lua/plenary.nvim",
	"lilydjwg/colorizer",
	"HiPhish/rainbow-delimiters.nvim",
}

Strix-Vyxlor avatar Jun 27 '25 10:06 Strix-Vyxlor

Looks like setting up ollama as custom openai provider works as a workaround:

require("avante").setup({
  provider = "ollamalocal",
  providers = {
    ollamalocal = {
      __inherited_from = "openai",
      api_key_name = "",
      endpoint = "http://localhost:11434/v1",
      model = "qwen2.5-coder:7b",
      mode = "legacy",
      --disable_tools = true, -- Open-source models often do not support tools.
    },
  }
})

This was part of documentation before ollama has become first-class provider: https://github.com/yetone/avante.nvim/wiki/Custom-providers/_compare/d87212fcaab29ba4f452c3915228419c7e5a0544...04f1f3f0efd12762ef5d6aceb7a8cd6c8ecd18d2#diff-ed8189c90e042e1720ee528314f51c95fd7317ef2542c4a0ed2b688178b61203L137

liontiger23 avatar Jun 29 '25 11:06 liontiger23

Running into the same issue using:

OS: Pop!_OS 22.04 LTS x86_64 neovim: v0.12.0-dev

Hoping for a resolution that allows ollama to stay a first-class provider.

wfar avatar Jul 10 '25 07:07 wfar

i was having the same issue, doing :AvanteModels seems to show that it was not using the ollama provider instead the openai/default :AvanteSwitchProvider ollama also didn't seem to work

OS: Arch Linux x86_64 neovim: v0.11.2

fknemi avatar Jul 11 '25 17:07 fknemi

Same issue.

Macos Sequoia 15.5 neovim: v0.11.2

If I set --disable_tools=true, it becomes unstuck and works.

            ollama = {
                endpoint = "...:11434",
                disable_tools = true,
                model = "llama3.1:8b",
            }

and if I set --disable_tools=false, it only unstuck and works if I add stream = true

            ollama = {
                endpoint = "...:11434",
                disable_tools = false,
                model = "llama3.1:8b",
                extra_request_body = {
                    stream = true
                }
            }

@Strix-Vyxlor, could you please try this configuration combination?

GiuseppeMP avatar Jul 16 '25 16:07 GiuseppeMP

I'm experiencing the same error with Ollama 0.10.1. I've tried various models that work with ollama run, but I can't get Avante to generate a response. I've tested many configurations on both macOS and ArchLinux, but none have worked. I've confirmed that Avante works fine with other providers like Gemini and Copilot, but it doesn't work with Ollama. I've even tried the configuration suggested by @GiuseppeMP, but there's no change. Is anyone else still not getting any results?.

NVIM v0.11.3 with lazy packagemanager

carlosmera24 avatar Aug 02 '25 20:08 carlosmera24

Possibly related to https://github.com/yetone/avante.nvim/issues/2270?

Onjrew avatar Aug 12 '25 21:08 Onjrew

I tried setting extra_request_body.stream = true and it didn't change anything. Still getting the hang when tool use occurs.

kevincox avatar Aug 14 '25 23:08 kevincox

This behavior is happening to me but it's clearly caused by the local ollama model crashing with an out of memory error. Avante doesn't catch the crash and just hangs forever. Avante is to blame for the crash though because I can send the same queries to the same Ollama models with the ollama CLI or CodeCompanion.nvim.

Error from the logs:

avante-ollama-fail.log

I suggest other people with this issue scroll further up in their logs to see if that's there's an error. There could be multiple issues causing the same symptoms. The last three lines should be

time=2025-08-30T15:36:19.100-05:00 level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2"
time=2025-08-30T15:36:19.100-05:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:39527/completion\": EOF"
[GIN] 2025/08/30 - 15:36:19 | 200 | 20.381746284s |       127.0.0.1 | POST     "/api/chat"

disable_tools = true does work but extra_request_body.stream = true does not.

Neovim Version:

nvim --version
NVIM v0.11.3
Build type: RelWithDebInfo
LuaJIT 2.1.1753364724
Run "nvim -V1 -v" for more info

Ollama version

ollama --version
ollama version is 0.11.4

ptwales avatar Aug 30 '25 20:08 ptwales

Yes, I am seeing similar behaviour. The server is showing a crash and avante hangs. Using the model via other methods including other nvim plugins or the CLI don't crash and have no issues.

kevincox avatar Aug 30 '25 21:08 kevincox

It should have ended on 2 + 2 = 4 but it just went on generating forever and if i wait enough, then it gets stuck on tools usage. and it doesn't even work, just returns JSON in chat. It does not even respect Avante Stop command

- Datetime: 2025-09-05 18:02:17
- Model:    ollama/deepseek-r1:1.5b

> what is 2+2
> 


2 + 2 = 4


🤔 Thought content:

  > First, looking at the available tools, I see that 'math_search' is designed to search through existing mathematical data. Since the task involves searching for pre-defined mathematical results, it fits perfectly. 

  > I'll start by creating a new search query with 'math_search'. This tool will look up any known math problems in our knowledge base. It might find something like 2+2=4, which is straightforward.

  > Next, after computing this result using the add_todos tool, I need to present it in a formatted way. The 'format' tool seems appropriate here because it can convert numbers into formatted strings with proper LaTeX formatting. This will make the output look clean and readable.

  > Finally, since the user wants the final answer squared, I'll use the 'math_add' tool for squaring the result. Squaring 4 gives me 16, which is the desired outcome.

  > Putting it all together, each step uses a different tool in sequence to ensure clarity and correctness. This way, the user can see that each step is handled efficiently without any confusion.

Let me help you with these computations using the appropriate tools. The most efficient approach would be to use `math_search` for searching math-related results first.

Here's how it will work:

1. First search through existing mathematical data:
```json


  1. Then update the result using add_todos to format it nicely:

<tool_use>{"name":"add_todos","input":{"id":"1","status":"done"}}
  1. Finally, square the final answer using math_add:
<tool_use>{"name":"math_add","input":{"math_result":"4"}}</tool_use>

This sequence will show only what we compute and no search results since it's all pre-defined math problems from our knowledgebase.

Let me know if you'd like any clarifications!

🤔 Thought content:

First, I need to understand what each tool does. There are several tools like add_todos and math_add, which seem relevant for adding tasks and formatting answers.

The user mentioned that they want to use add_todos to format it nicely and then use math_add to square the result if needed. But in this case, since we're just adding a task, maybe add_todos is sufficient.

Looking at the provided examples, using add_todos with the id "1" and status "done" seems appropriate. So, I should structure the task like that.

I think the steps are:

  1. Use math_search to search existing math-related results (though in this case, it's not needed as we're adding a task).
  2. Then use add_todos with id 1 and status done.
  3. Finally, since the result is just "Implement a responsive layout", maybe it's already formatted correctly.

But wait, the user wants to square the final answer using math_add. That might be for cases where there's a numerical answer, but in this scenario, it's more of formatting than computation. So perhaps I should avoid using math_add here unless needed.

Alternatively, if squaring is part of presenting a numerical result from some computation, that would make sense. But in this case, the task itself doesn't involve any calculation beyond just adding it as text.

So, to keep it simple, I'll proceed with adding the todo item using add_todos and then maybe use another tool for formatting or if needed.

Let me help you format that properly using the appropriate tools. Since this is a task addition, we can use add_todos.

Here's how it would look:




🤔 Thought content:

<task>Create an empty todo list</task>



Let me help you create a to-do list properly using the appropriate tools. Since this is a task addition, we can use `add_todos`.

Here's how it would look:

```json




and now its stuck on tool calling.

My conf:

	opts = {
		provider = "gemini",
		cursor_applying_provider = "gemini",
		behaviour = {
			auto_suggestions = false,
			auto_set_keymaps = true,
			enable_cursor_planning_mode = true,
		},
		providers = {
			gemini = {
				model = "gemini-2.5-pro",
			},
			ollama = {
				endpoint = "http://127.0.0.1:11435",
				model = "deepseek-r1:1.5b",
				hide_in_model_selector = false,
				legacy = true,
				extra_request_body = {
					system = "You are a highly skilled software engineer with extensive knowledge in many programming languages, frameworks, design patterns, and best practices.\n\nRespect and use existing conventions, libraries, etc that are already present in the code base.\n\nMake sure code comments are in English when generating them.\n\nAs an AI coding agent, you have access to various tools to help you understand and work with the codebase. Always utilize the available tools when they can provide better context or information about the code you're working with. Use tools to read files, search for patterns, explore project structure, and gather any necessary context before making changes or suggestions.\n\nPrioritize using tools over making assumptions about code that isn't directly visible in the conversation.",
				}
			},
			["llama3.2"] = {
				__inherited_from = "ollama",
				model = "llama3.2:3b",
			},
			["deepseek-coder-v2"] = {
				__inherited_from = "ollama",
				model = "deepseek-coder-v2:latest",
			},
			["codegemma"] = { __inherited_from = "ollama", model = "codegemma:2b" },
			["starcoder"] = {
				__inherited_from = "ollama",
				model = "starcoder:1b",
			},
			["gemma3"] = {
				__inherited_from = "ollama",
				model = "gemma3:1b",
			},
			["deepseek-r1"] = { __inherited_from = "ollama", model = "deepseek-r1:1.5b" },
			["llama3.2-vision"] = { __inherited_from = "ollama", model = "llama3.2-vision:latest" },
		},
	},

PixsaOJ avatar Sep 05 '25 14:09 PixsaOJ

I've been testing further with various models and smaller. I also tried @PixsaOJ's original suggestion of what worked well. However even with no crashing in the models it still gets stuck seemingly around tool calls.

kevincox avatar Sep 06 '25 15:09 kevincox

Same here, i use https://github.com/intel/ipex-llm (docker ollama one), infinite "generating" in Avante, with cpu going through the roof, i have to restart ollama then.

ollama version is 0.9.3

abdoulayeYATERA avatar Oct 04 '25 12:10 abdoulayeYATERA

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Nov 04 '25 02:11 github-actions[bot]

Same for me, the model doesn't crash, but the tool calls hang up all of the time. I have yet to make a request which completes properly.

Katzenbiber avatar Nov 04 '25 23:11 Katzenbiber

The last time I tried this I seemed to be getting some progress. Some tool calls succeeded. But then I got this in the output log:

{"error":"error parsing tool call: raw='{\"name\":\"write_to_file\",\"input\":{\"path\":\"AGENTS.md\",\"the_content\":\"# Project Handover Guide\\n\\n## Overview\\n\\nThis repository implements a **Ricochet Robot Solver**. The codebase is split across multiple languages:\\n\\n- **Rust** – core solver logic, command‑line utilities and WebAssembly build target.\\n- **TypeScript** – front‑end UI that consumes the compiled WebAssembly module.\\n- **Shell / Nix** – build scripts and CI configuration.\\n\\nThe solver is a puzzle game where a robot must reach a target location on a grid with only sliding moves. The project includes a web UI, CLI tools, tests, and a set of example puzzles.\\n\\n## Directory Structure\\n\\n```\\n├─ Cargo.toml\\n├─ Cargo.lock\\n├─ src/                # Rust source (solver core, CLI, WebAssembly glue)\\n│  ├─ lib.rs\\n│  ├─ solver.rs\\n│  ├─ game.rs\\n│  ├─ board.rs\\n│  ├─ robot.rs\\n│  ├─ cell.rs\\n│  ├─ mirror.rs\\n│  ├─ direction.rs\\n│  ├─ position.rs\\n│  ├─ format.rs\\n│  ├─ precompute.rs\\n│  ├─ ...\\n├─ tests/              # Rust unit tests\\n├─ benches/            # Benchmarks\\n├─ web/                # Static web assets (index.html, service‑worker.js, etc.)\\n├─ typescript/         # TypeScript source that calls the WebAssembly module\\n├─ default.nix         # Nix build expressions\\n├─ tsconfig.json\\n├─ README.md\\n└─ ...\\n```\\n\\n## Building the Project\\n\\nThe project uses **Nix** for reproducible builds.\\n\\n```sh\\n# Build the optimized binary and the WebAssembly bundle\\nnix-build -A unoptimized\\nnix-build -A wasm-web\\n```\\n\\nThe `unoptimized` attribute builds a Rust binary and places it in `./result/bin/ricochetrobots`. The `wasm-web` attribute compiles the Rust code to WebAssembly and outputs the bundle to `./result/wasm`.\\n\\nTo run the web UI locally:\\n\\n```sh\\npython -m http.server --directory result/\\n```\\n\\nThe server will serve the `index.html` that loads the WebAssembly module.\\n\\n## Running the CLI\\n\\n```sh\\n# Solve a puzzle from the `games` directory\\n./result/bin/ricochetrobots solve games/17.txt\\n```\\n\\nThe CLI also offers a debug pre‑compute mode and a benchmark harness.\\n\\n## TypeScript Integration\\n\\nThe TypeScript definitions are generated from the Rust code via `wasm-bindgen`. After building the WebAssembly bundle, run:\\n\\n```sh\\nnix-build -A wasm-web --out-link wasm\\n```\\n\\nThis will create a `wasm` directory containing the `.d.ts` files that your editor can consume.\\n\\n## Testing\\n\\nRun the Rust tests with:\\n\\n```sh\\ncargo test\\n```\\n\\nBenchmark tests are in `benches/` and can be run with `cargo bench`.\\n\\n## Key Modules\\n\\n- `src/solver.rs` – Implements the breadth‑first search used by the solver.\\n- `src/game.rs` – Represents a puzzle instance (grid, robots, goal).\\n- `src/web.rs` – Exposes the solver to the browser via `wasm-bindgen`.\\n- `typescript/` – UI components, state management, and WebAssembly glue.\\n\\n## Contribution Guidelines\\n\\n1. **Use the Nix shell**: `nix develop` to get a reproducible dev environment.\\n2. **Run tests** before committing.\\n3. **Keep CI passing** – the `.gitlab-ci.yml` runs lint, type‑check, and tests.\\n4. **Document changes** in PR titles and commit messages.\\n\\n## Contact\\n\\nFor questions, contact the project maintainer: `kevincox`.\\n\"}\"}', err=invalid character '\"' after object key:value pair"}

However this seems like a different problem than the previous one. But maybe the general failure of error handling is similar.

kevincox avatar Dec 03 '25 19:12 kevincox