servers icon indicating copy to clipboard operation
servers copied to clipboard

Rename "search_files" to "search_files_by_name". Add "search_file_contents"

Open sebastien-rosset opened this issue 8 months ago • 8 comments
trafficstars

Description

  1. Fix #896: "Search file by name" enhancements
    1. Rename the existing search_files function to search_files_by_name
    2. Improve performance for deeply nested directories by processing directories in parallel batches.
    3. Improve description to make it clear it's a file/directory name matcher, not a file content search.
    4. Improve error handling
  2. Add new search_file_contents to search file content.
    1. Plain text or regex pattern matching
    2. Case-sensitive/insensitive search
    3. Configurable result limits
    4. Context lines around matches
    5. File glob filtering (include/exclude)
  3. Better parameter documentation throughout
  4. Fix #735, #1067: sometimes the LLM was confused what is the exact syntax for file name matching. Is it substring match, regex match, glob pattern or something else? It wasn't very clear from the name and description.

Server Details

  • Server: filesystem
  • Changes to: tools

Motivation and Context

Both Claude and human users may misinterpret the search_files function as a content search tool (like grep) rather than a file/directory name matcher. The suggested description makes the distinction clear from the beginning and explicitly notes that it doesn't search file contents.

How Has This Been Tested?

Tested with Claude Desktop with the following prompts, before and after the change.

  1. The "Before" column shows which function was selected by the LLM before this PR.
  2. The "After" column shows which function was selected by the LLM with this PR, for the exact same prompt.
Prompt Before After
Find all references to "XYZ" under
/Users/my_userid/my_project/
search_files search_file_content
Find files whose names contain "color" under
/Users/my_userid/my_project/
search_files search_files_by_name
Search files that contain "XYZ" under
/Users/my_userid/my_project using filesystem MCP.
search_files search_file_content
Search "XYZ" under
/Users/my_userid/my_project using filesystem MCP.
search_files search_file_content
Search "xyz" in file a/b/c/foo.cpp search_files search_file_content

For prompt 1, the LLM can easily infer it should be searching for file contents, not matching file names. Previously, the LLM mistakenly invoked the search_files function, which typically would return no matching results.

For prompts 3 and 4, there is intentionally some prompt ambiguity to test how the LLM decide whether to use search_files_by_name or search_file_content based on the function names and descriptions.

Breaking Changes

Will users need to update their MCP client configurations?

No, but I'm not sure I understand breaking changes. I'm unclear whether it's ok to rename a function, or if it's considered to be a breaking change.

Types of changes

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Documentation update

Checklist

  • [x] I have read the MCP Protocol Documentation
  • [x] My changes follows MCP security best practices
  • [x] I have updated the server's README accordingly
  • [x] I have tested this with an LLM client
  • [x] My code follows the repository's style guidelines
  • [ ] New and existing tests pass locally
  • [x] I have added appropriate error handling
  • [x] I have documented all environment variables and configuration options

Additional context

Tests for the search_file_contents function

  • [x] Basic text search returns correct results for common terms
  • [x] Case-sensitive search correctly differentiates between uppercase and lowercase patterns
  • [x] Regex pattern search correctly identifies patterns like function declarations async function \w+\(
  • [x] Context lines display works correctly with different values (0, 2, 4 lines)
  • [x] Max results parameter properly limits the number of results returned
  • [x] File extension filtering correctly includes only the specified file types (.ts, .js, etc.)
  • [x] File extension filtering handles files without extensions correctly
  • [x] Exclude patterns with wildcards (**/*.test.ts) correctly filter out matching files
  • [x] Exclude patterns with simple strings (*config*) correctly filter content containing those strings
  • [x] Combined search with regex + case sensitivity + file types works correctly
  • [x] Search handles large files efficiently without excessive memory usage
  • [x] Directory recursion works correctly for nested file structures
  • [x] Error handling for inaccessible files or directories works as expected
  • [x] Race conditions are avoided when searching directories in parallel
  • [x] Files with non-UTF8 encoding are handled gracefully without crashing

sebastien-rosset avatar Mar 17 '25 21:03 sebastien-rosset

Another typically test scenario is when refactoring code. I've noticed Claude typically tries to read a file if the user has provided the file name, as in the example below:

I'll help you rewrite the message so that the configuration parameter can be properly translated. Let me first examine the file to understand the context better. View result from read_file from filesystem (local) Result too long, truncated to 100000 characters

If the file is too large, then the LLM tries to search content within the file(s). Previously, it was invoking the search_file function, but that always returned empty results, because Claude was confused about the purpose of the search_file function. With this PR, the LLM correctly invokes search_file_contents:

Now let me look for any references to translating the parameter name: View result from search_file_contents from filesystem (local)

This makes it possible to have better discovery of a code base, e.g.:

View result from list_allowed_directories from filesystem (local) Let me look at the xxxx.cpp file to understand the current constructor implementation: View result from read_file from filesystem (local) Now I'll look at the xxxx.h file to understand the class definitions better: View result from read_file from filesystem (local) Let's look for all header files related to the xxxx plugin: View result from search_files_by_name from filesystem (local) Let's try another approach and see what files are in the src directory View result from list_directory from filesystem (local) Now let's look for the corresponding header files: View result from list_directory from filesystem (local) Great! Now let's look at the zzzz.h file to understand how the xxxx class is defined: View result from read_file from filesystem (local) Now let's examine the xxxx.cpp file to see how the constructor is implemented: View result from read_file from filesystem (local) Now that I have a good understanding of the codebase, I can modify xxxx ...

sebastien-rosset avatar Mar 17 '25 22:03 sebastien-rosset

I've noticed sometimes Claude invokes search_files_by_name with the following input:

{
  `path`: `/Users/xxxx/myproject/src`,
  `pattern`: `.h`
}
[{"type": "text", "text": "No matches found"}]

I improved the description of the pattern argument to make it clear this is a substring match, not a glob and not a regex.

sebastien-rosset avatar Mar 19 '25 00:03 sebastien-rosset

I've noticed that Claude AI often tries to invoke the search functions with a regex, most likely because the name of the input argument is "pattern". I've added a useRegex argument to make it more explicit.

sebastien-rosset avatar Apr 07 '25 16:04 sebastien-rosset

Thanks for this, I came here with the same issue that you've solved here. I was going to suggest find_files and grep_files as the names of the tools here, to match the names of the tools that they behave like, but if by_name and contents is enough to make it clear to Claude what to do then I'm happy with it.

FalacerSelene avatar Apr 24 '25 10:04 FalacerSelene

Thanks for this, I came here with the same issue that you've solved here. I was going to suggest find_files and grep_files as the names of the tools here, to match the names of the tools that they behave like, but if by_name and contents is enough to make it clear to Claude what to do then I'm happy with it.

I've used this for over a month and it generally works well. I don't mind changing to "find_files" and "grep_files". I'm not sure if it would change the behavior. Based on my use of Claude for the past several weeks, I have renamed the pattern argument to searchText. I had noticed sometimes Claude was invoking the tool with a regex, probably because the word "pattern" is strongly associated with "regular expression".

sebastien-rosset avatar Apr 29 '25 03:04 sebastien-rosset

I tried your change and my LLM was confused.. so I asked it to figure out what was wrong and suggest a change.. there is what it suggested:

The original instructions were a bit open-ended, which led to a minor confusion about parameter names. Here are a couple of improvements that could make it clearer:

  • Explicitly mention parameter names: The prompt could have stated that search_file_contents uses a parameter named searchText (not pattern), and search_files_by_name uses pattern. This hint would directly alert the user to use the correct keys without having to infer from code or error messages.
  • Provide a usage example: Including a short example of each function’s usage in the prompt would serve as a template. For instance, showing a JSON snippet for each case (similar to the examples above) would remove ambiguity.
  • Clarify the goal: If the goal was to teach the assistant how to use the tool properly, the prompt might explicitly say, "Find out the correct parameters by reading the tool’s source code or documentation, then demonstrate a correct usage." This way, it’s clear that simply guessing (as happened initially) might not be enough, and that examining the definitions is required.

By refining the prompt with these details, the assistant (or developer) would be less likely to misuse the functions and run into errors.

rakelkar avatar May 11 '25 20:05 rakelkar

Here is the prompt hint it suggested: This is a tool that works with the filesystem. Assistant: Tips for Using the Filesystem Search Tools:

                        - **Distinguish Search Functions:** There are two different search functions:
                        - `filesystem.search_files_by_name` – use this to find files or directories by name (uses a **`pattern`** argument).
                        - `filesystem.search_file_contents` – use this to search **inside** files for text (uses a **`searchText`** argument).

                        - **Required Parameters:**
                        - **`path`** (for **both** functions): The directory (or file) path where the search should occur. Must be within an allowed directory (use `filesystem.list_allowed_directories` to see allowed paths).
                        - **`pattern`** (for **name search**): The filename pattern to match (supports wildcards like `*.txt` or substrings).
                        - **`searchText`** (for **content search**): The text string (or regex, if enabled) to search for **within** file contents.

                        - **Optional Parameters:**
                        - *For `search_files_by_name`:* `excludePatterns` (array of patterns to **omit** from the search, e.g., `["node_modules/**"]` to skip certain folders).
                        - *For `search_file_contents`:*
                            - `useRegex` (boolean, default `false`): treat `searchText` as regex if true.
                            - `caseSensitive` (boolean, default `false`): make search case-sensitive if true.
                            - `maxResults` (number, default `100`): limit the number of matches returned.
                            - `contextLines` (number, default `2`): lines of context to show before/after each match.
                            - `includePatterns` (array): patterns to **include only** specific files in the search.
                            - `excludePatterns` (array): patterns to exclude files or directories.

                        - **Important:** **Do not confuse the parameter names.** For content searches, always use `searchText` (not `pattern`). For name searches, use `pattern`. Using the wrong key will result in an error (e.g., a missing required field error for `searchText` if you accidentally used `pattern` in `search_file_contents`).

                        - **Example Usage:**
                        - *Search within files for "TODO":*
                            ```json
                            {
                            "path": "/project/src",
                            "searchText": "TODO",
                            "caseSensitive": false,
                            "excludePatterns": ["node_modules/**", "*.min.js"]
                            }
                            ```
                            (This looks in the `/project/src` directory for the text "TODO", case-insensitive, skipping `node_modules` and any `.min.js` files.)
                        - *Search for files with "config" in their name:*
                            ```json
                            {
                            "path": "/project",
                            "pattern": "*config*",
                            "excludePatterns": ["**/build/**"]
                            }
                            ```
                            (This finds files or folders under `/project` with "config" in the name, excluding any in a `build` directory.)

rakelkar avatar May 11 '25 20:05 rakelkar

I tried your change and my LLM was confused.. so I asked it to figure out what was wrong and suggest a change.. there is what it suggested:

@rakelkar, out of curiosity, what LLM were you using?

When using Claude desktop, the application reads the available tools and their arguments using "prompts/lists" as shown in the log below. The MCP server responds with the input schema, parameter names and description.

2025-05-13T14:43:04.025Z [info] [filesystem] Message from client: {"method":"prompts/list","params":{},"jsonrpc":"2.0","id":4}
2025-05-13T14:43:04.029Z [info] [filesystem] Message from server: {"jsonrpc":"2.0","id":1,"result":{"tools":[{"name":"read_file","description":"Read the complete contents of a file from the file system. Handles various text encodings and provides detailed error messages if the file cannot be read. Use this tool when you need to examine the contents of a single file. Only works within allowed directories.","inputSchema":{"type":"object","properties":{"path":{"type":"string"}},"required":["path"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}},{"name":"read_multiple_files","description":"Read the contents of multiple files simultaneously. This is more efficient than reading files one by one when you need to analyze or compare multiple files. Each file's content is returned with its path as a reference. Failed reads for individual files won't stop the entire operation. Only works within allowed directories.","inputSchema":{"type":"object","properties":{"paths":{"type":"array","items":{"type":"string"}}},"required":["paths"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}},{"name":"write_file","description":"Create a new file or completely overwrite an existing file with new content. Use with caution as it will overwrite existing files without warning. Handles text content with proper encoding. Only works within allowed directories.","inputSchema":{"type":"object","properties":{"path":{"type":"string"},"content":{"type":"string"}},"required":["path","content"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}},{"name":"edit_file","description":"Make line-based edits to a text file. Each edit replaces exact line sequences with new content. Returns a git-style diff showing the changes made. Only works within allowed directories.","inputSchema":{"type":"object","properties":{"path":{"type":"string","description":"File path to edit"},"edits":{"type":"array","items":{"type":"object","properties":{"oldText":{"type":"string","description":"Text to search for - must match exactly"},"newText":{"type":"string","description":"Text to replace with"}},"required":["oldText","newText"],"additionalProperties":false}},"dryRun":{"type":"boolean","default":false,"description":"Preview changes using git-style diff format"}},"required":["path","edits"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}},{"name":"create_directory","description":"Create a new directory or ensure a directory exists. Can create multiple nested directories in one operation. If the directory already exists, this operation will succeed silently. Perfect for setting up directory structures for projects or ensuring required paths exist. Only works within allowed directories.","inputSchema":{"type":"object","properties":{"path":{"type":"string"}},"required":["path"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}},{"name":"list_directory","description":"Get a detailed listing of all files and directories in a specified path. Results clearly distinguish between files and directories with [FILE] and [DIR] prefixes. This tool is essential for understanding directory structure and finding specific files within a directory. Only works within allowed directories.","inputSchema":{"type":"object","properties":{"path":{"type":"string"}},"required":["path"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}},{"name":"directory_tree","description":"Get a recursive tree view of files and directories as a JSON structure. Each entry includes 'name', 'type' (file/directory), and 'children' for directories. Files have no children array, while directories always have a children array (which may be empty). The output is formatted with 2-space indentation for readability. Only works within allowed directories.","inputSchema":{"type":"object","properties":{"path":{"type":"string"}},"required":["path"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}},{"name":"move_file","description":"Move or rename files and directories. Can move files between directories and rename them in a single operation. If the destination exists, the operation will fail. Works across different directories and can be used for simple renaming within the same directory. Both source and destination must be within allowed directories.","inputSchema":{"type":"object","properties":{"source":{"type":"string"},"destination":{"type":"string"}},"required":["source","destination"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}},{"name":"search_files_by_name","description":"Find files and directories whose names match a pattern. Searches recursively through all subdirectories from the starting path. Supports glob patterns like '*.txt' or '**/*.js' as well as simple substring matching. The search is case-insensitive by default unless the pattern contains uppercase characters. Returns full paths to all items with matching names. Great for finding files when you don't know their exact location. Only searches within allowed directories.","inputSchema":{"type":"object","properties":{"path":{"type":"string","description":"The root directory path to start the search from"},"pattern":{"type":"string","description":"Pattern to match within file/directory names. Supports glob patterns. Case insensitive unless pattern contains uppercase characters"},"excludePatterns":{"type":"array","items":{"type":"string"},"default":[],"description":"Glob patterns for paths to exclude from search (e.g., \"node_modules/**\")"}},"required":["path","pattern"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}},{"name":"search_file_contents","description":"Search for text patterns within file contents. Can search either a single file or recursively through a directory. Supports both plain text substring matching and regex patterns. The search can be case-sensitive or insensitive based on parameters. Returns matching file paths along with line numbers and context lines before/after the match. Only searches within allowed directories.","inputSchema":{"type":"object","properties":{"path":{"type":"string","description":"Path to search - can be a file path or a directory to search recursively"},"searchText":{"type":"string","description":"Text to search for - supports plain text substring matching or regex if useRegex is true"},"useRegex":{"type":"boolean","default":false,"description":"When false (default), performs simple text matching. When true, interprets searchText as a regular expression"},"caseSensitive":{"type":"boolean","default":false,"description":"When true, perform case-sensitive matching"},"maxResults":{"type":"number","default":100,"description":"Maximum number of matching results to return"},"contextLines":{"type":"number","default":2,"description":"Number of lines to show before and after each match"},"includePatterns":{"type":"array","items":{"type":"string"},"default":[],"description":"Glob patterns for paths to include in search (e.g., [\"**/*.js\", \"**/*.ts\"])"},"excludePatterns":{"type":"array","items":{"type":"string"},"default":[],"description":"Glob patterns for paths to exclude from search (e.g., [\"node_modules/**\", \"*.test.ts\"])"}},"required":["path","searchText"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}},{"name":"get_file_info","description":"Retrieve detailed metadata about a file or directory. Returns comprehensive information including size, creation time, last modified time, permissions, and type. This tool is perfect for understanding file characteristics without reading the actual content. Only works within allowed directories.","inputSchema":{"type":"object","properties":{"path":{"type":"string"}},"required":["path"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}},{"name":"list_allowed_directories","description":"Returns the list of directories that this server is allowed to access. Use this to understand which directories are available before trying to access files.","inputSchema":{"type":"object","properties":{},"required":[]}}]}}

sebastien-rosset avatar May 13 '25 14:05 sebastien-rosset

@olaservo , is there anything that I need to do such that this PR can be considered for review?

sebastien-rosset avatar Jun 18 '25 21:06 sebastien-rosset

The official MCP Servers cannot guarantee itself even for such simple tools, so better unit test them or consider implementing these basic ones by ourselves.

Symbolk avatar Jul 06 '25 06:07 Symbolk

I think this needs to be a tool in that server.

turabisaas avatar Jul 25 '25 07:07 turabisaas

Not sure what's holding up this PR, but the tool failing to search for files by extension seems like it should be fixed ...

mcp

search_files:
Arguments: {"path":"/tmp/work","pattern":"*.png"}
Result: [{"type":"text","text":"No matches found"}]

check results

find /tmp/work -name '*.png' | wc -l
       4

davidpfarrell avatar Aug 21 '25 05:08 davidpfarrell

Not sure what's holding up this PR, but the tool failing to search for files by extension seems like it should be fixed ...

According to the CONTRIBUTING.md, improvements to existing servers like this filesystem enhancement are welcome, but I don't have visibility into the review timelines and I don't know if a repo maintainer will look at this PR.

I'm happy to address any concerns or make adjustments. The fix has been working well in my testing environment for months, and I believe it would benefit the broader MCP community.

sebastien-rosset avatar Aug 21 '25 17:08 sebastien-rosset

Resolved merge conflicts after #543 was merged.

sebastien-rosset avatar Aug 22 '25 18:08 sebastien-rosset

Awesome, thanks for the contribution! Sorry we've been very slow getting to this - we're a small team, most of whom are volunteer contributors.

The poor tool description definitely seems like a bug in the filesystem server, and it makes a lot of sense why models are confused by it. I think also the patterns was quite confusing for models previously.

After #623 and #745, I think this should be quite a bit better. Additionally, I've taken a stab at fixing the problem with a more minor description change in #2606 - somewhat inspired by the suggestions in the issue you raised, so thank you.

I think we're trying to keep these servers small and avoid adding net new functionality so I'm hesitant to accept a new tool search_file_contents right now (as you have probably realized from your experiences raising this PR, we're stretched for maintainer time so more things to maintain is scary!). The filesystem MCP (and other reference servers) are meant to be a developer reference for how you can build servers, rather than a primary server for people to use - I think maybe we need to do a better job communicating this. We'd be very happy to accept a PR to add links to different filesystem MCPs into the README that explore things like having a search_file_contents tool :)

However, I would be happy to approve and merge the parts around improving testing and nested folder traversal performance, if you'd be open to reworking this PR to target that / opening a new PR for that and tagging me? Possibly it'd be worth waiting until #2606 is merged, given I think that's a small one which can go in easily soon - and save you having to resolve yet another set of merge conflicts 😅

Thanks again for putting up with us here, and for thoughtfully raising the issue and PR! And sorry filesystem MCP has been in such a bad state this whole time.

domdomegg avatar Aug 23 '25 07:08 domdomegg

Awesome, thanks for the contribution! Sorry we've been very slow getting to this - we're a small team, most of whom are volunteer contributors.

The poor tool description definitely seems like a bug in the filesystem server, and it makes a lot of sense why models are confused by it. I think also the patterns was quite confusing for models previously.

After #623 and #745, I think this should be quite a bit better. Additionally, I've taken a stab at fixing the problem with a more minor description change in #2606 - somewhat inspired by the suggestions in the issue you raised, so thank you.

I think we're trying to keep these servers small and avoid adding net new functionality so I'm hesitant to accept a new tool search_file_contents right now (as you have probably realized from your experiences raising this PR, we're stretched for maintainer time so more things to maintain is scary!).

I've removed the search_file_contents action in this PR, and I've retained that new action in a separate branch.

The filesystem MCP (and other reference servers) are meant to be a developer reference for how you can build servers, rather than a primary server for people to use -

Hmm, the filesystem MCP server is used in production by many applications, including Claude desktop, VS code, Cursor and many others. The "filesystem" tool is not "just a reference implementation".

I think maybe we need to do a better job communicating this. We'd be very happy to accept a PR to add links to different filesystem MCPs into the README that explore things like having a search_file_contents tool :)

Interesting, I would have thought it makes sense to extend the existing MCP tool rather than create a new one. This is what I had in mind:

  1. Applications that already integrate the filesystem MCP server for file operations would immediately benefit from the search content capability, without having to install yet another MCP tool.
  2. File content search is fundamentally a filesystem operation - the LLM is querying the filesystem about what's inside files, just as it can query about file names and metadata.
  3. Content search often needs to be combined with file metadata (size, modification time, permissions) and path operations. Having everything in one server eliminates round-trips and enables more efficient combined operations.
  4. Mental model: developers think of content search as a filesystem operation (like grep or IDE search). Splitting it into a separate tool creates cognitive overhead - users would need to remember which server handles which file-related operations.

However, I would be happy to approve and merge the parts around improving testing and nested folder traversal performance, if you'd be open to reworking this PR to target that / opening a new PR for that and tagging me? Possibly it'd be worth waiting until #2606 is merged, given I think that's a small one which can go in easily soon - and save you having to resolve yet another set of merge conflicts 😅

Working on it by first resolving merge conflicts and removing the search_file_contents action.

Thanks again for putting up with us here, and for thoughtfully raising the issue and PR! And sorry filesystem MCP has been in such a bad state this whole time.

sebastien-rosset avatar Aug 25 '25 18:08 sebastien-rosset