positron icon indicating copy to clipboard operation
positron copied to clipboard

Positron Assistant: getProjectTree tool should ignore venv

Open wch opened this issue 7 months ago • 4 comments

System details:

Positron and OS details:

Positron Version: 2025.06.0 (Universal) build 136 Code - OSS Version: 1.100.0 Commit: 20a61568915619cfd6f7690f8cad3a3ad58f55fa Date: 2025-05-23T03:33:22.916Z Electron: 34.5.1 Chromium: 132.0.6834.210 Node.js: 20.19.0 V8: 13.2.152.41-electron.0 OS: Darwin arm64 24.4.0

Describe the issue:

The getProjectTree tool can see files in venv, and this could include a large number of files that shouldn't be seen and fill up context.

Steps to reproduce the issue:

  1. Create a directory named venv
  2. Put some files in it
  3. Ask Positron Assistant, "What files are in venv"?

It will list the files in there.

The tool does not accept any arguments; it sends all the names of all files in the workspace, except for the ones that are on its exclude list, so there isn't a way to avoid sending the files in venv.

Related: there should be some way to limit the number of filenames sent. @andrie ran into an issue where he was hitting an Anthropic rate limit because the tool was sending so much information to the LLM. (I don't believe he had a venv dir, but there were many other filenames being sent.)

wch avatar May 23 '25 19:05 wch

This is also related to https://github.com/posit-dev/positron/issues/7724 with excluding files from inline completion. We probably want similar exclusions for inline completion and the getProjectTree tool.

timtmok avatar May 23 '25 20:05 timtmok

We need to be extremely protective of the context; Tier 1 (the lowest) customers for Anthropic only get 40K input tokens per minute. Our Posit API keys are more like 2M tokens per minute, so we never see these problems ourselves.

jcheng5 avatar May 23 '25 20:05 jcheng5

In case it helps, I do in fact have a .venv folder in the project where the problem occurred. I requested the databot to inspect the data folder, but it sent the entire contents of the project root to Claude.

andrie avatar May 23 '25 20:05 andrie

Maybe the tool should do something like this:

  • Get a list of all files
  • If there are more than N files, send to the LLM up to M files in the top dir, and a list of subdirectories and how many files are in each subdir.

Then the LLM can choose to look in specific directories.

This also means that the tool should allow specifying directories to look in -- maybe add include and exclude arguments?

wch avatar May 23 '25 21:05 wch

I discovered that my problem happened because my project contains a python .venv folder as well as an R renv folder. Most of the payload that described the tree view was in fact the renv folder contents.

The tool should possibly ignore or truncate contents of renv.

andrie avatar Jun 01 '25 19:06 andrie

Summary

  • .venv was being ignored, but not venv nor renv.
  • PR: https://github.com/posit-dev/positron/pull/7997 -- now ignores venv and renv along with some other glob patterns by default
  • the project tree tool now takes various input parameters
  • anecdotally, the project tree tool may be a bit faster now
  • there is now an Output log for Assistant

Looking for feedback on

  • tool input parameters (too many? causing the model to make mistakes?)
  • the default maximum files included in the tree is 500 files, which can be overridden in the input parameters -- how does this feel? Should this number be adjusted?
  • empty folders are not captured in the project tree -- is this an issue?
  • any other inclusion/exclusion of files/folders that is unexpected?
  • see https://github.com/posit-dev/positron/pull/7997 for more comments around the project tree experience

sharon-wang avatar Jun 10 '25 16:06 sharon-wang

Great, I'm glad there's a fix in for this!

Feedback for these questions:

  • tool input parameters (too many? causing the model to make mistakes?)

I think the number of parameters is reasonable -- my sense is that it's not too many for LLMs to handle correctly, but if we experience otherwise, this may be worth revisiting. It also helps that all of the params are optional.

  • the default maximum files included in the tree is 500 files, which can be overridden in the input parameters -- how does this feel? Should this number be adjusted?

500 feels a bit high to me. I know this may seem a bit strange, but what about limiting it by number of characters? For example, maybe all of the string lengths added together could be at max 4000 characters.

  • empty folders are not captured in the project tree -- is this an issue?

I think this is OK. The LLM might get a little confused if it tries to create a directory that already exists, but they can generally recover gracefully from this. However, I think the description should inform the LLM of this behavior to reduce confusion when this happens.

  • any other inclusion/exclusion of files/folders that is unexpected?

It would be nice for the tool to be able to list the contents within a subdir. I think it is relatively common for users to be working on part of a project within a subdirectory, and in that case, it would be helpful for to limit the results to just that subdir.

wch avatar Jun 10 '25 18:06 wch

Verified Fixed

Positron Version(s) : 2025.07.0-84 OS Version(s) : Windows 11

Test scenario(s)

  • created project with venv, renv, and .venv and several additional files.
  • Asked assistant "Tell me about this project" - Claude 4 Sonnet The assistant correctly ignored those directories and told me about the rest of the project

jonvanausdeln avatar Jun 11 '25 17:06 jonvanausdeln

tool input parameters (too many? causing the model to make mistakes?) I think the number of parameters is reasonable -- my sense is that it's not too many for LLMs to handle correctly, but if we experience otherwise, this may be worth revisiting. It also helps that all of the params are optional.

👌

the default maximum files included in the tree is 500 files, which can be overridden in the input parameters -- how does this feel? Should this number be adjusted? 500 feels a bit high to me. I know this may seem a bit strange, but what about limiting it by number of characters? For example, maybe all of the string lengths added together could be at max 4000 characters.

opened https://github.com/posit-dev/positron/issues/8344

empty folders are not captured in the project tree -- is this an issue? I think this is OK. The LLM might get a little confused if it tries to create a directory that already exists, but they can generally recover gracefully from this. However, I think the description should inform the LLM of this behavior to reduce confusion when this happens.

Empty folders are not included in the tree. is part of the getProjectTree tool description:

https://github.com/posit-dev/positron/blob/1797942e9d3c7768aad1abccf88147a1362c1326/extensions/positron-assistant/package.json#L342

We can keep an eye on if this is sufficient or if the LLM is still running into confusion!

any other inclusion/exclusion of files/folders that is unexpected? It would be nice for the tool to be able to list the contents within a subdir. I think it is relatively common for users to be working on part of a project within a subdirectory, and in that case, it would be helpful for to limit the results to just that subdir.

opened https://github.com/posit-dev/positron/issues/8345

sharon-wang avatar Jun 30 '25 18:06 sharon-wang