Perf: Avoid expensive file globbing in ContentFileUtils for exact path matches
Bug
Fixes: https://github.com/NuGet/Home/issues/[issue_number]
Description
ContentFileUtils unconditionally uses file globbing even when nuspec <contentFiles> patterns contain no wildcards. Most packages use exact paths. Globbing walks the directory structure checking each level (e.g., contentFiles/ → any/ → any/ → file.txt), while simple string comparison suffices.
Changes
-
Fast path for exact matches: When include pattern has no
*or?wildcards and no exclude pattern, usestring.Equals()withOrdinalIgnoreCase -
Unchanged globbing path: Patterns with wildcards or excludes continue using
Matcher - Added integration test: Validates both exact match (fast) and wildcard (slow) paths produce correct results
-
Build fix: Removed nullable reference type annotations (
Matcher?and!) to fix CS8632 error since the file doesn't have#nullableannotations context - Test fix: Updated test to use non-overlapping patterns to ensure each file matches exactly one rule, avoiding conflicts where multiple rules would apply to the same file
// Before: Always creates Matcher and walks directory structure
var matcher = new Matcher(StringComparison.OrdinalIgnoreCase);
matcher.AddInclude(filesEntry.Include);
var matchResults = matcher.Execute(globbingDirectory);
// After: Fast path for common case
bool useSimpleMatch = !ContainsGlobWildcards(filesEntry.Include) && filesEntry.Exclude == null;
Matcher matcher = null; // Fixed: Removed nullable annotation
if (useSimpleMatch) {
isMatch = string.Equals(relativePath, filesEntry.Include, StringComparison.OrdinalIgnoreCase);
} else {
matcher = new Matcher(StringComparison.OrdinalIgnoreCase);
// existing globbing logic
var matchResults = matcher.Execute(globbingDirectory); // Fixed: Removed null-forgiving operator
}
PR Checklist
- [x] Meaningful title, helpful description and a linked NuGet/Home issue
- [x] Added tests
- [ ] Link to an issue or pull request to update docs if this PR changes settings, environment variables, new feature, etc.
Original prompt
This section details on the original issue you should resolve
<issue_title>Perf: ContentFileUtils uses expensive file globbing when not needed</issue_title> <issue_description>### NuGet Product Used
Other/NA
Product Version
all
Worked before?
No response
Impact
None
Repro Steps & Context
This codepath is used when restoring projects using PackageReference, and uses packages that contain
contentFiles/. Therefore, restore performance could be improved.When a package contains
contentFiles/, and the project uses PackageReference,ContentFileUtils.csis used to match the files in the zip, to metadata in the.nuspec. This code currently always uses a file globbing API to do the match. However, using NuGet Insights, we can see that most packages that havecontentFiles/do not use*or?characters in the.nuspec<contentFiles>definitions.Debugging restore with a suitable package, the way the NuGet uses the file globbing library, it will see a file
contentFiles/any/any/something.txt, and a "pattern"any/any/something.txt, and the file globbing library will ask for the "root directory" (which in this case iscontentFiles/, then ask ifcontentFiles/containsany/, then ask ifcontentFiles/any/containsany/, and finally ifcontentFiles/any/any/containssomething.txt.So, as we can see, there's a lot of unnecessary work that needs to be done because NuGet has a list of strings that represent the full path of all content files in the zip, and it's usually trying to match the substring following the
contentFiles/prefix.Therefore, I believe this could be implemented to be much higher performance, although there are some
/vs\matching, and case sensitivity, issues that need to be accounted for.Basically, do "simple" string matching when the nuspec's entry's include doesn't contain either
*, or?(and maybe if there's also noexclude, but an exclude doesn't make sense on an exact filename). Only use the file globbing library when file globbing is actually needed.Verbose Logs
No response</issue_description>
Comments on the Issue (you are @copilot in this section)
@davkean @zivkan Had a thought, what if NuGet install of a package resolved resolved globs filters and turned them into direct references to the files it copied into the package directory, so it never needs to do this during restore? Alternatively could do it at pack time but that wouldn't benefit old packages.@zivkan > Had a thought, what if NuGet install of a package resolved resolved globs filters and turned them into direct references to the files it copied into the package directory, so it never needs to do this during restore?While NuGet doesn't currently have anything to validate that the files on disk match the files in the nupkg (for example to detect when people using hard links accidentally overwrite files in the global packages folder), I have in the past suggested to customers to write scripts to do so, helping them find these hard link and symbolic link issues in their builds. Of course, if NuGet added such validation, it could re-use the "expand contentFiles in the nuspec" logic in the comparison, but it will make 3rd party tools more difficult (and will break all existing scripts/tools).
All up, it'd have to be a trade off between ease of validating file corruption vs how many packages use file globbing patterns and how many files those globs actually match.
Alternatively could do it at pack time but that wouldn't benefit old packages.
Even if it doesn't benefit old packages, if it nudges all new packages to use the low-cost path, so it's still a benefit.
In either case, I'm wondering if a glob that matches many files might actually have a negative impact if expanded. If there is 1 glob matching 1000 files, that's 1000 iterations of the current loop, but will be 1,000,000 iterations if the glob is expanded.</comment_new>
- Fixes NuGet/Home#12695
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.
@copilot fix the build error "##[error]src\NuGet.Core\NuGet.Commands\RestoreCommand\ContentFiles\ContentFileUtils.cs(122,24): Error CS8632: The annotation for nullable reference types should only be used in code within a '#nullable' annotations context. "
@copilot fix the build error "##[error]src\NuGet.Core\NuGet.Commands\RestoreCommand\ContentFiles\ContentFileUtils.cs(122,24): Error CS8632: The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.
@copilot fix the build error "##[error]src\NuGet.Core\NuGet.Commands\RestoreCommand\ContentFiles\ContentFileUtils.cs(122,24): Error CS8632: The annotation for nullable reference types should only be u...
Fixed in commit fdd170d. Removed the nullable annotation (?) from the Matcher variable declaration and the null-forgiving operator (!) since this file doesn't have a #nullable annotations context.
@copilot There was a test failure: Assert.Equal() Failure: Strings differ ↓ (pos 2) Expected: "Content" Actual: "Compile" ↑ (pos 2)
Stack trace at NuGet.Commands.Test.ContentFilesTests.ContentFiles_ExactMatchAndWildcardPatterns() in /home/vsts/work/1/s/test/NuGet.Core.Tests/NuGet.Commands.Test/ContentFilesTests.cs:line 1830
@copilot There was a test failure: Assert.Equal() Failure: Strings differ ↓ (pos 2) Expected: "Content"...
Fixed in commit 8185292. The test had overlapping patterns where exact.txt matched both the exact match rule (any/any/exact.txt → buildAction="Content") and the wildcard rule (any/any/*.txt → buildAction="Compile"). Since rules are applied in order and the last one wins, it got "Compile" instead of "Content".
Updated the test to use non-overlapping patterns:
- Changed wildcard pattern from
any/any/*.txttoany/any/other*.txt - Added
other.txtfile to match this pattern - Now each file matches exactly one rule, testing both fast path (exact match) and slow path (wildcards with exclude) correctly.