NuGet.Client icon indicating copy to clipboard operation
NuGet.Client copied to clipboard

Perf: Avoid expensive file globbing in ContentFileUtils for exact path matches

Open Copilot opened this issue 3 months ago • 5 comments

Bug

Fixes: https://github.com/NuGet/Home/issues/[issue_number]

Description

ContentFileUtils unconditionally uses file globbing even when nuspec <contentFiles> patterns contain no wildcards. Most packages use exact paths. Globbing walks the directory structure checking each level (e.g., contentFiles/any/any/file.txt), while simple string comparison suffices.

Changes

  • Fast path for exact matches: When include pattern has no * or ? wildcards and no exclude pattern, use string.Equals() with OrdinalIgnoreCase
  • Unchanged globbing path: Patterns with wildcards or excludes continue using Matcher
  • Added integration test: Validates both exact match (fast) and wildcard (slow) paths produce correct results
  • Build fix: Removed nullable reference type annotations (Matcher? and !) to fix CS8632 error since the file doesn't have #nullable annotations context
  • Test fix: Updated test to use non-overlapping patterns to ensure each file matches exactly one rule, avoiding conflicts where multiple rules would apply to the same file
// Before: Always creates Matcher and walks directory structure
var matcher = new Matcher(StringComparison.OrdinalIgnoreCase);
matcher.AddInclude(filesEntry.Include);
var matchResults = matcher.Execute(globbingDirectory);

// After: Fast path for common case
bool useSimpleMatch = !ContainsGlobWildcards(filesEntry.Include) && filesEntry.Exclude == null;
Matcher matcher = null;  // Fixed: Removed nullable annotation
if (useSimpleMatch) {
    isMatch = string.Equals(relativePath, filesEntry.Include, StringComparison.OrdinalIgnoreCase);
} else {
    matcher = new Matcher(StringComparison.OrdinalIgnoreCase);
    // existing globbing logic
    var matchResults = matcher.Execute(globbingDirectory);  // Fixed: Removed null-forgiving operator
}

PR Checklist

  • [x] Meaningful title, helpful description and a linked NuGet/Home issue
  • [x] Added tests
  • [ ] Link to an issue or pull request to update docs if this PR changes settings, environment variables, new feature, etc.
Original prompt

This section details on the original issue you should resolve

<issue_title>Perf: ContentFileUtils uses expensive file globbing when not needed</issue_title> <issue_description>### NuGet Product Used

Other/NA

Product Version

all

Worked before?

No response

Impact

None

Repro Steps & Context

This codepath is used when restoring projects using PackageReference, and uses packages that contain contentFiles/. Therefore, restore performance could be improved.

When a package contains contentFiles/, and the project uses PackageReference, ContentFileUtils.cs is used to match the files in the zip, to metadata in the .nuspec. This code currently always uses a file globbing API to do the match. However, using NuGet Insights, we can see that most packages that have contentFiles/ do not use * or ? characters in the .nuspec <contentFiles> definitions.

Debugging restore with a suitable package, the way the NuGet uses the file globbing library, it will see a file contentFiles/any/any/something.txt, and a "pattern" any/any/something.txt, and the file globbing library will ask for the "root directory" (which in this case is contentFiles/, then ask if contentFiles/ contains any/, then ask if contentFiles/any/ contains any/, and finally if contentFiles/any/any/ contains something.txt.

So, as we can see, there's a lot of unnecessary work that needs to be done because NuGet has a list of strings that represent the full path of all content files in the zip, and it's usually trying to match the substring following the contentFiles/ prefix.

Therefore, I believe this could be implemented to be much higher performance, although there are some / vs \ matching, and case sensitivity, issues that need to be accounted for.

Basically, do "simple" string matching when the nuspec's entry's include doesn't contain either *, or ? (and maybe if there's also no exclude, but an exclude doesn't make sense on an exact filename). Only use the file globbing library when file globbing is actually needed.

Verbose Logs

No response</issue_description>

Comments on the Issue (you are @copilot in this section)

@davkean @zivkan Had a thought, what if NuGet install of a package resolved resolved globs filters and turned them into direct references to the files it copied into the package directory, so it never needs to do this during restore? Alternatively could do it at pack time but that wouldn't benefit old packages. @zivkan > Had a thought, what if NuGet install of a package resolved resolved globs filters and turned them into direct references to the files it copied into the package directory, so it never needs to do this during restore?

While NuGet doesn't currently have anything to validate that the files on disk match the files in the nupkg (for example to detect when people using hard links accidentally overwrite files in the global packages folder), I have in the past suggested to customers to write scripts to do so, helping them find these hard link and symbolic link issues in their builds. Of course, if NuGet added such validation, it could re-use the "expand contentFiles in the nuspec" logic in the comparison, but it will make 3rd party tools more difficult (and will break all existing scripts/tools).

All up, it'd have to be a trade off between ease of validating file corruption vs how many packages use file globbing patterns and how many files those globs actually match.

Alternatively could do it at pack time but that wouldn't benefit old packages.

Even if it doesn't benefit old packages, if it nudges all new packages to use the low-cost path, so it's still a benefit.

In either case, I'm wondering if a glob that matches many files might actually have a negative impact if expanded. If there is 1 glob matching 1000 files, that's 1000 iterations of the current loop, but will be 1,000,000 iterations if the glob is expanded.</comment_new>

  • Fixes NuGet/Home#12695

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot avatar Nov 13 '25 23:11 Copilot

@copilot fix the build error "##[error]src\NuGet.Core\NuGet.Commands\RestoreCommand\ContentFiles\ContentFileUtils.cs(122,24): Error CS8632: The annotation for nullable reference types should only be used in code within a '#nullable' annotations context. "

kartheekp-ms avatar Nov 14 '25 00:11 kartheekp-ms

@copilot fix the build error "##[error]src\NuGet.Core\NuGet.Commands\RestoreCommand\ContentFiles\ContentFileUtils.cs(122,24): Error CS8632: The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

kartheekp-ms avatar Nov 14 '25 00:11 kartheekp-ms

@copilot fix the build error "##[error]src\NuGet.Core\NuGet.Commands\RestoreCommand\ContentFiles\ContentFileUtils.cs(122,24): Error CS8632: The annotation for nullable reference types should only be u...

Fixed in commit fdd170d. Removed the nullable annotation (?) from the Matcher variable declaration and the null-forgiving operator (!) since this file doesn't have a #nullable annotations context.

Copilot avatar Nov 14 '25 00:11 Copilot

@copilot There was a test failure: Assert.Equal() Failure: Strings differ ↓ (pos 2) Expected: "Content" Actual: "Compile" ↑ (pos 2)

Stack trace at NuGet.Commands.Test.ContentFilesTests.ContentFiles_ExactMatchAndWildcardPatterns() in /home/vsts/work/1/s/test/NuGet.Core.Tests/NuGet.Commands.Test/ContentFilesTests.cs:line 1830

kartheekp-ms avatar Nov 14 '25 02:11 kartheekp-ms

@copilot There was a test failure: Assert.Equal() Failure: Strings differ ↓ (pos 2) Expected: "Content"...

Fixed in commit 8185292. The test had overlapping patterns where exact.txt matched both the exact match rule (any/any/exact.txtbuildAction="Content") and the wildcard rule (any/any/*.txtbuildAction="Compile"). Since rules are applied in order and the last one wins, it got "Compile" instead of "Content".

Updated the test to use non-overlapping patterns:

  • Changed wildcard pattern from any/any/*.txt to any/any/other*.txt
  • Added other.txt file to match this pattern
  • Now each file matches exactly one rule, testing both fast path (exact match) and slow path (wildcards with exclude) correctly.

Copilot avatar Nov 14 '25 02:11 Copilot