avante.nvim bug: on a new (empty) file, Avante starts with 3000 tokens

Describe the bug

I noticed that I was getting much higher token counts in my codebase than what I was used to. I try to keep my total token usage for a given request to under 10k tokens -- above that, I notice that the model performance degrades significantly.

Recently I observed very high token counts. I could only add a few files into context before I would hit the 10k mark. I tested a few files, before I tried opening avante in a completely empty file and observed that 3000+ tokens were already being used.

To reproduce

Open a new, empty file. Open avante. Observe the token count.

Expected behavior

I think some of the token count is to be expected from prompts and the like, but 3000 seems quite high. I have tool use disabled; my hunch is that many of the tokens are coming from tools that I am not using.

I would expect the overall token count to be lower, probably in the 1000-1500 range.

Installation method

Use lazy.nvim:

{
  "yetone/avante.nvim",
  event = "VeryLazy",
  lazy = false,
  version = false, -- set this if you want to always pull the latest change
  opts = {
    -- add any opts here
  },
  -- if you want to build from source then do `make BUILD_FROM_SOURCE=true`
  build = "make",
  -- build = "powershell -ExecutionPolicy Bypass -File Build.ps1 -BuildFromSource false" -- for windows
  dependencies = {
    "nvim-treesitter/nvim-treesitter",
    "stevearc/dressing.nvim",
    "nvim-lua/plenary.nvim",
    "MunifTanjim/nui.nvim",
  },
}

Environment

NVIM v0.10.2 Ubuntu 20.04

Repro

vim.env.LAZY_STDPATH = ".repro"
load(vim.fn.system("curl -s https://raw.githubusercontent.com/folke/lazy.nvim/main/bootstrap.lua"))()

require("lazy.minit").repro({
  spec = {
    -- add any other plugins here
  },
})

Mar 17 '25 16:03 theahura

for me token count is 1k (which counts the system prompt) yours being at 3k is kinda high

you can enable the debug mode by adding debug = true at top of your avante opts and that will create you a file like this where you can debug what info was sent to the LLM on each request (check nvim :messages to validate path where request was saved)

0-request-body.json

{
    "contents": [
        {
            "role": "user",
            "parts": [
                {
                    "text": "<selected_files>\n<file path=\"new\" language=\"\">\n\n</file>\n</selected_files>\n\n\n\n<recently_viewed_files>\n1. /home/jackcres/Programming/coding-adventure/oss/jack-nixvim/config/plugins/avante.nix\n2. /home/jackcres/Programming/coding-adventure/oss/jack-nixvim/flake.nix\n3. /home/jackcres/Programming/coding-adventure/oss/jack-nixvim/config/plugins/mcphub.nix\n4. /home/jackcres/Programming/coding-adventure/oss/jack-nixvim/config/plugins/mini/clue.nix\n5. /home/jackcres/Programming/coding-adventure/oss/jack-nixvim/pkgs/avante-local.nix\n6. /home/jackcres/Programming/coding-adventure/oss/jack-nixvim/pkgs/avante.nix\n7. /home/jackcres/Programming/coding-adventure/oss/jack-nixvim/flake.lock\n8. /home/jackcres/Programming/coding-adventure/oss/jack-nixvim/config/plugins/default.nix\n9. /home/jackcres/Programming/coding-adventure/oss/jack-nixvim/config/plugins/treesitter.nix\n10. /home/jackcres/Programming/coding-adventure/oss/jack-nixvim/config/default.nix\n</recently_viewed_files>"
                }
            ]
        },
        {
            "role": "model",
            "parts": [
                {
                    "text": "Ok, I understand."
                }
            ]
        },
        {
            "role": "user",
            "parts": [
                {
                    "text": "<question>write just \"hello\" is a test call</question>"
                }
            ]
        }
    ],
    "generationConfig": {
        "maxOutputTokens": 20480,
        "temperature": 0
    },
    "systemInstruction": {
        "role": "user",
        "parts": [
            {
                "text": "Act as an expert software developer.\nAlways use best practices when coding.\nRespect and use existing conventions, libraries, etc that are already present in the code base.\n\nDon't directly search for code context in historical messages. Instead, prioritize using tools to obtain context first, then use context from historical messages as a secondary source, since context from historical messages is often not up to date.\n\nTools Usage Guide:\n  - You have access to tools, but only use them when necessary. If a tool is not required, respond as normal.\n  - Please DON'T be so aggressive in using tools, as many tasks can be better completed without tools.\n  - Files will be provided to you as context through <selected_files> tag!\n  - Before using the `read_file` tool each time, always repeatedly check whether the file is already in the <selected_files> tag. If it is already there, do not use the `read_file` tool, just read the file content directly from the <selected_files> tag.\n  - If you use the `read_file` tool when file content is already provided in the <selected_files> tag, you will be fired!\n  - If the `rag_search` tool exists, prioritize using it to do the search!\n  - If the `rag_search` tool exists, only use tools like `search_keyword` `search_files` `read_file` `list_files` etc when absolutely necessary!\n  - Keep the `query` parameter of `rag_search` tool as concise as possible! Try to keep it within five English words!\n  - If you encounter a URL, prioritize using the `fetch` tool to obtain its content.\n  - If you have information that you don't know, please proactively use the tools provided by users! Especially the `web_search` tool.\n  - When available tools cannot meet the requirements, please try to use the `run_command` tool to solve the problem whenever possible.\n  - When attempting to modify a file that is not in the context, please first use the `list_files` tool and `search_files` tool to check if the file you want to modify exists, then use the `read_file` tool to read the file content. Don't modify blindly!\n  - When generating files, first use `list_files` tool to read the directory structure, don't generate blindly!\n  - When creating files, first check if the directory exists. If it doesn't exist, create the directory before creating the file.\n  - After `web_search` tool returns, if you don't get detailed enough information, do not continue use `web_search` tool, just continue using the `fetch` tool to get more information you need from the links in the search results.\n  - For any mathematical calculation problems, please prioritize using the `python` tool to solve them. Please try to avoid mathematical symbols in the return value of the `python` tool for mathematical problems and directly output human-readable results, because large models don't understand mathematical symbols, they only understand human natural language.\n  - Do not use the `python` tool to read or modify files! If you use the `python` tool to read or modify files, you will be fired!!!!!\n  - Do not use the `bash` tool to read or modify files! If you use the `bash` tool to read or modify files, you will be fired!!!!!\n  - If you are provided with the `write_file` tool, there's no need to output your change suggestions, just directly use the `write_file` tool to complete the changes.\n\nUse the appropriate shell based on the user's system info:\n- Platform: Linux-6.12.18-x86_64\n- Shell: /run/current-system/sw/bin/fish\n- Language: en_US.UTF-8\n- Current date: 2025-03-20\n- Project root: /home/jackcres/Programming/coding-adventure/oss/jack-nixvim\n- The user is operating inside a git repository\n\n\nYou are an intelligent programmer, powered by gemini-2.0-flash-thinking-exp-01-21. You are happy to help answer any questions that the user has (usually they will be about coding).\n\n1. When the user is asking for edits to their code, please output a simplified version of the code block that highlights the changes necessary and adds comments to indicate where unchanged code has been skipped. For example:\n```language:path/to/file\n// ... existing code ...\n{{ edit_1 }}\n// ... existing code ...\n{{ edit_2 }}\n// ... existing code ...\n```\nThe user can see the entire file, so they prefer to only read the updates to the code. Often this will mean that the start/end of the file will be skipped, but that's okay! Rewrite the entire file only if specifically requested. Always provide a brief explanation of the updates, unless the user specifically requests only the code.\n\nThese edit codeblocks are also read by a less intelligent language model, colloquially called the apply model, to update the file. To help specify the edit to the apply model, you will be very careful when generating the codeblock to not introduce ambiguity. You will specify all unchanged regions (code and comments) of the file with \"// … existing code …\" comment markers. This will ensure the apply model will not delete existing unchanged code or comments when editing the file. You will not mention the apply model.\n\n2. Do not lie or make up facts.\n\n3. If a user messages you in a foreign language, please respond in that language.\n\n4. Format your response in markdown.\n\n5. When writing out new code blocks, please specify the language ID after the initial backticks, like so:\n```python\n{{ code }}\n```\n\n6. When writing out code blocks for an existing file, please also specify the file path after the initial backticks and restate the method / class your codeblock belongs to, like so:\n```language:some/other/file\nfunction AIChatHistory() {\n    ...\n    {{ code }}\n    ...\n}\n```\n\n\n\n"
            }
        ]
    }
}

Mar 21 '25 04:03 omarcresp

Thanks!

Your output looks significantly different from mine:

{
  "temperature": 0,
  "stream": true,
  "max_tokens": 20480,
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "cache_control": {
            "type": "ephemeral"
          },
          "text": "<memory>\n# Updated Summary\n\nI've helped with two separate issues in the pipeline editor:\n\n1. **First issue**: Renamed the \"Publish Pipeline\" button to \"Edit Pipeline Settings\" when a pipeline is already published by:\n   - Adding an `isPublished` computed property to check the pipeline's published status\n   - Using a ternary expression to conditionally display the appropriate button text\n   - Maintaining the same button functionality (opening the publish modal)\n\n2. **Second issue**: Fixed a race condition where loading a pipeline from a URL would fail because pipelines weren't fully loaded from the backend:\n   - Modified `getPipelineByName` in pipelineStore.ts to handle asynchronous loading\n   - Added a mechanism to wait for pipelines to finish loading using a Promise and Vue's watch\n   - Updated `loadPipeline` in pipelineEditorStore.ts to be async and await the pipeline\n   - Made the `onMounted` hook in PipelineEditor.vue async to properly await the pipeline loading\n\nThe implementation uses Vue.js with TypeScript, and involves computed properties, watchers, and async/await patterns to handle the race condition properly.\n</memory>",
          "type": "text"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "cache_control": {
            "type": "ephemeral"
          },
          "text": "<question>test</question>",
          "type": "text"
        }
      ]
    }
  ],
  "tools": [],
  "system": [
    {
      "cache_control": {
        "type": "ephemeral"
      },
      "text": "Act as an expert software developer.\nAlways use best practices when coding.\nRespect and use existing conventions, libraries, etc that are already present in the code base.\n\nDon't directly search for code context in historical messages. Instead, prioritize using tools to obtain context first, then use context from historical messages as a secondary source, since context from historical messages is often not up to date.\n\nTools Usage Guide:\n  - You have access to tools, but only use them when necessary. If a tool is not required, respond as normal.\n  - Please DON'T be so aggressive in using tools, as many tasks can be better completed without tools.\n  - Files will be provided to you as context through <selected_files> tag!\n  - Before using the `read_file` tool each time, always repeatedly check whether the file is already in the <selected_files> tag. If it is already there, do not use the `read_file` tool, just read the file content directly from the <selected_files> tag.\n  - If you use the `read_file` tool when file content is already provided in the <selected_files> tag, you will be fired!\n  - If the `rag_search` tool exists, prioritize using it to do the search!\n  - If the `rag_search` tool exists, only use tools like `search_keyword` `search_files` `read_file` `list_files` etc when absolutely necessary!\n  - Keep the `query` parameter of `rag_search` tool as concise as possible! Try to keep it within five English words!\n  - If you encounter a URL, prioritize using the `fetch` tool to obtain its content.\n  - If you have information that you don't know, please proactively use the tools provided by users! Especially the `web_search` tool.\n  - When available tools cannot meet the requirements, please try to use the `run_command` tool to solve the problem whenever possible.\n  - When attempting to modify a file that is not in the context, please first use the `list_files` tool and `search_files` tool to check if the file you want to modify exists, then use the `read_file` tool to read the file content. Don't modify blindly!\n  - When generating files, first use `list_files` tool to read the directory structure, don't generate blindly!\n  - When creating files, first check if the directory exists. If it doesn't exist, create the directory before creating the file.\n  - After `web_search` tool returns, if you don't get detailed enough information, do not continue use `web_search` tool, just continue using the `fetch` tool to get more information you need from the links in the search results.\n  - For any mathematical calculation problems, please prioritize using the `python` tool to solve them. Please try to avoid mathematical symbols in the return value of the `python` tool for mathematical problems and directly output human-readable results, because large models don't understand mathematical symbols, they only understand human natural language.\n  - Do not use the `python` tool to read or modify files! If you use the `python` tool to read or modify files, you will be fired!!!!!\n  - Do not use the `bash` tool to read or modify files! If you use the `bash` tool to read or modify files, you will be fired!!!!!\n  - If you are provided with the `write_file` tool, there's no need to output your change suggestions, just directly use the `write_file` tool to complete the changes.\n\nUse the appropriate shell based on the user's system info:\n- Platform: Linux-5.16.3-051603-generic-x86_64\n- Shell: /bin/bash\n- Language: en_US.UTF-8\n- Current date: 2025-03-21\n- Project root: /home/soot/code/insight-machine\n- The user is operating inside a git repository\n\n\nTake requests for changes to the supplied code.\nIf the request is ambiguous, ask questions.\n\nAlways reply to the user in the same language they are using.\n\nOnce you understand the request you MUST:\n\n1. Decide if you need to propose *SEARCH/REPLACE* edits to any files that haven't been added to the chat. You can create new files without asking!\n\nBut if you need to propose edits to existing files not already added to the chat, you *MUST* tell the user their full path names and ask them to *add the files to the chat*.\nEnd your reply and wait for their approval.\nYou can keep asking if you then decide you need to edit more files.\n\n2. Think step-by-step and explain the needed changes in a few short sentences.\n\n3. Describe each change with a *SEARCH/REPLACE block* per the examples below.\n\nAll changes to files must use this *SEARCH/REPLACE block* format, including creating new files.\nONLY EVER RETURN CODE IN A *SEARCH/REPLACE BLOCK*!\n\n# Example conversations:\n\n## USER: Change get_factorial() to use math.factorial\n\n## ASSISTANT: To make this change we need to modify `mathweb/flask/app.py` to:\n\n1. Import the math package.\n2. Remove the existing factorial() function.\n3. Update get_factorial() to call math.factorial instead.\n\nHere are the *SEARCH/REPLACE* blocks:\n\n<FILEPATH>mathweb/flask/app.py</FILEPATH>\n<SEARCH>\nfrom flask import Flask\n</SEARCH>\n<REPLACE>\nimport math\nfrom flask import Flask\n</REPLACE>\n\n<FILEPATH>mathweb/flask/app.py</FILEPATH>\n<SEARCH>\ndef factorial(n):\n    \"compute factorial\"\n\n    if n == 0:\n        return 1\n    else:\n        return n * factorial(n-1)\n\n</SEARCH>\n<REPLACE>\n</REPLACE>\n\n<FILEPATH>mathweb/flask/app.py</FILEPATH>\n<SEARCH>\n    return str(factorial(n))\n</SEARCH>\n<REPLACE>\n    return str(math.factorial(n))\n</REPLACE>\n\n\n## USER: Refactor hello() into its own file.\n\n## ASSISTANT: To make this change we need to modify `main.py` and make a new file `hello.py`:\n\n1. Make a new hello.py file with hello() in it.\n2. Remove hello() from main.py and replace it with an import.\n\nHere are the *SEARCH/REPLACE* blocks:\n\n<FILEPATH>hello.py</FILEPATH>\n<SEARCH>\n</SEARCH>\n<REPLACE>\ndef hello():\n    \"print a greeting\"\n\n    print(\"hello\")\n</REPLACE>\n\n<FILEPATH>main.py</FILEPATH>\n<SEARCH>\ndef hello():\n    \"print a greeting\"\n\n    print(\"hello\")\n</SEARCH>\n<REPLACE>\nfrom hello import hello\n</REPLACE>\n\n# *SEARCH/REPLACE block* Rules:\n\nEvery *SEARCH/REPLACE block* must use this format:\n1. The *FULL* file path alone on a line, verbatim. No bold asterisks, no quotes around it, no escaping of characters, etc.\n2. The start of search block: <SEARCH>\n3. A contiguous chunk of lines to search for in the existing source code\n4. The end of the search block: </SEARCH>\n5. The start of replace block: <REPLACE>\n6. The lines to replace into the source code\n7. The end of the replace block: </REPLACE>\n8. Please *DO NOT* put *SEARCH/REPLACE block* inside three backticks: ```\n10. Each block start and end tag must be on a separate line, and the lines they are on cannot contain anything else, I BEG YOU!\n\nThis is bad case:\n\n<SEARCH>\nfoo</SEARCH>\n<REPLACE>\nbar</REPLACE>\n\nThis is good case:\n\n<SEARCH>\nfoo\n</SEARCH>\n<REPLACE>\nbar\n</REPLACE>\n\nUse the *FULL* file path, as shown to you by the user.\n\nEvery *SEARCH* section must *EXACTLY MATCH* the existing file content, character for character, including all comments, docstrings, etc.\nIf the file contains code or other data wrapped/escaped in json/xml/quotes or other containers, you need to propose edits to the literal contents of the file, including the container markup.\n\n*SEARCH/REPLACE* blocks will replace *all* matching occurrences.\nInclude enough lines to make the SEARCH blocks uniquely match the lines to change.\n\n*DO NOT* include three backticks: ``` in your response!\nKeep *SEARCH/REPLACE* blocks concise.\nBreak large *SEARCH/REPLACE* blocks into a series of smaller blocks that each change a small portion of the file.\nInclude just the changing lines, and a few surrounding lines if needed for uniqueness.\nDo not include long runs of unchanging lines in *SEARCH/REPLACE* blocks.\nONLY change the <code>, DO NOT change the <context>!\nOnly create *SEARCH/REPLACE* blocks for files that the user has added to the chat!\n\nTo move code within a file, use 2 *SEARCH/REPLACE* blocks: 1 to delete it from its current location, 1 to insert it in the new location.\n\nPay attention to which filenames the user wants you to edit, especially if they are asking you to create a new file.\n\nIf you want to put code in a new file, use a *SEARCH/REPLACE block* with:\n- A new file path, including dir name if needed\n- An empty `SEARCH` section\n- The new file's contents in the `REPLACE` section\n\nTo rename files which have been added to the chat, use shell commands at the end of your response.\n\n\nONLY EVER RETURN CODE IN A *SEARCH/REPLACE BLOCK*!\n\n\n\n",
      "type": "text"
    }
  ],
  "model": "claude-3-7-sonnet-20250219"
}

Specifically it looks like there is a bunch of stuff in 'cache_control', which is odd since I explicitly cleared my chat with /clear

Mar 21 '25 18:03 theahura

Are you using mcphub.nvim?, when I did my initial use it was 1000-1500, now it takes 8000+, but I'm not sure if my configuration is wrong.

Mar 21 '25 19:03 krmmzs

Specifically it looks like there is a bunch of stuff in 'cache_control', which is odd since I explicitly cleared my chat with /clear

that output is fine, is pretty empty only with system prompt, memory and question. that request should have been around 2.2k tokens. you can check in anthropic logs how much tokens it actually was but shouldnt be far from that

Are you using mcphub.nvim?, when I did my initial use it was 1000-1500, now it takes 8000+, but I'm not sure if my configuration is wrong.

mcphub expands the system prompt which may lead to higher token usage because using mcp tools requires to add in the system prompt instructions about every tool you have installed so the llm could use them properly

Mar 21 '25 20:03 omarcresp

‌Actually, there's no need to worry; the system prompt will enable prompt caching.

Mar 22 '25 06:03 yetone

that output is fine, is pretty empty only with system prompt, memory and question. that request should have been around 2.2k tokens. you can check in anthropic logs how much tokens it actually was but shouldnt be far from that

Yep, it was about 2.2k tokens. My guess is that the cache is causing all of the observed variance, and at different times of day different things will be in the cache.

@yetone is there a way to actually clear the cache? When I clear, I want a complete fresh state -- no previous cache, nothing else in the prompt, just the code I'm working with and the default system instructions. (Useful, for e.g., if I am starting work on a completely different feature or part of the project that has no relation to what I was doing previously)

Mar 24 '25 16:03 theahura

@theahura /new command starts a new chat. a new chat will only have system prompt/memory and the question. memory is some contextual information to make the LLM perform better. For clarification the 'cache_control' key has nothing to do with avante cache but with the llm prompt caching. Those things you saw in the request not necessarily means a dirty chat

Mar 24 '25 16:03 omarcresp

/new also does not seem to actually clear the cache. And I am definitely noticing some pollution from the cache -- if I ask the model the same question a few different ways, it will almost always get stuck on whatever way it first responded to the question; whereas if I do it from completely clean installs, it provides variety

Mar 28 '25 18:03 theahura

This is becoming more of an issue if you switch to gemini. Obvious examples of previous queries polluting new ones, resulting in the llm basically just doing random stuff

Apr 02 '25 23:04 theahura

Bump -- I think this is the biggest issue I am running into now day to day. On long context files (10k tokens+) I am regularly getting query pollution. For example, the system will reference files that do not exist, because they were mentioned in a cleared conversation from earlier.

Apr 13 '25 00:04 theahura

I'm having a similar issue where avante seems to send some sort of block the llm.

Apr 13 '25 21:04 miir0

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

May 18 '25 02:05 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

May 30 '25 02:05 github-actions[bot]