msbuild icon indicating copy to clipboard operation
msbuild copied to clipboard

GetFileSpecInfoWithRegexObject allocates Regex

Open KirillOsenkov opened this issue 6 months ago • 11 comments

I noticed this allocates too much:

https://github.com/dotnet/msbuild/blob/4b2d01ad2b0ef3ee4a05ed29ff1ec2b558b51f17/src/Shared/FileMatcher.cs#L1498

Image

Let's investigate if we can cache these regexes, I suspect there's a very limited number of expressions here that we can reuse regexes for

KirillOsenkov avatar May 20 '25 19:05 KirillOsenkov

This simple change:

        private ConcurrentDictionary<string, (Regex regex, bool needsRecursion, bool isLegalFileSpec)> regexCache =
            new(StringComparer.Ordinal);

        internal void GetFileSpecInfoWithRegexObject(
            string filespec,
            out Regex regexFileMatch,
            out bool needsRecursion,
            out bool isLegalFileSpec)
        {
            var result = regexCache.GetOrAdd(filespec, spec =>
            {
                GetFileSpecInfoWithRegexObjectCore(spec, out var regex, out var needsRec, out var isLegal);
                return (regex, needsRec, isLegal);
            });
            regexFileMatch = result.regex;
            needsRecursion = result.needsRecursion;
            isLegalFileSpec = result.isLegalFileSpec;
        }

Shaves off 0.6 seconds from our incremental build, 13.9s -> 13.3s:

Original:

13.707
13.717
14.019
13.601
14.755
Average: 13.960 s

After the fix:

13.473
13.242
13.598
13.057
13.361
Average: 13.346 s

KirillOsenkov avatar May 20 '25 23:05 KirillOsenkov

it creates a regex for every metadata for every globbed file, resulting in >20,000 identical regexes in our relatively simple build.

KirillOsenkov avatar May 20 '25 23:05 KirillOsenkov

The PR is out, at least the first iteration.

SimaTian avatar May 30 '25 08:05 SimaTian

it creates a regex for every metadata for every globbed file

That sounds bad, what's the callstack here? Do we actually need different things for different metadata? For that matter do we need different things for different globs?

@SimaTian I'd like to understand a bit more before we go with the cache.

rainersigwald avatar May 30 '25 15:05 rainersigwald

Here's a repro:

<Project>

  <Target Name="Build">
    <PropertyGroup>
      <SourceDir>C:\temp\largedir</SourceDir>
      <DestinationDir>C:\temp\2</DestinationDir>
    </PropertyGroup>
    <ItemGroup>
      <File Include="$(SourceDir)\**\*" />
    </ItemGroup>
    <Copy SourceFiles="@(File)" DestinationFiles="@(File->'$(DestinationDir)\%(RecursiveDir)%(Filename)%(Extension)')" SkipUnchangedFiles="True" />
  </Target>

</Project>

Place a breakpoint in GetFileSpecInfoWithRegexObject(), ensure the C:\temp\largedir has more than one file, the method will be hit for every file with the same filespec parameter value.

KirillOsenkov avatar May 31 '25 00:05 KirillOsenkov

Bonus if you can compare the perf and memory usage of the fix on a very large directory before and after the fix.

KirillOsenkov avatar May 31 '25 00:05 KirillOsenkov

Nice self-contained repro. So the calls are:

>	Microsoft.Build.dll!Microsoft.Build.Shared.FileMatcher.GetFileSpecInfoWithRegexObject(string filespec, out System.Text.RegularExpressions.Regex regexFileMatch, out bool needsRecursion, out bool isLegalFileSpec) Line 1494	C#
 	Microsoft.Build.dll!Microsoft.Build.Shared.FileMatcher.FileMatch(string filespec, string fileToMatch) Line 1827	C#
 	Microsoft.Build.dll!Microsoft.Build.Evaluation.BuiltInMetadata.GetRecursiveDirValue(string evaluatedIncludeBeforeWildcardExpansionEscaped, string evaluatedIncludeEscaped) Line 110	C#
 	Microsoft.Build.dll!Microsoft.Build.Evaluation.BuiltInMetadata.GetMetadataValueEscaped(string currentDirectory, string evaluatedIncludeBeforeWildcardExpansionEscaped, string evaluatedIncludeEscaped, string definingProjectEscaped, string name, ref string fullPath) Line 81	C#
 	Microsoft.Build.dll!Microsoft.Build.Execution.ProjectItemInstance.TaskItem.GetBuiltInMetadataEscaped(string name) Line 1938	C#
 	Microsoft.Build.dll!Microsoft.Build.Execution.ProjectItemInstance.TaskItem.GetMetadataEscaped(string metadataName) Line 1353	C#
 	Microsoft.Build.dll!Microsoft.Build.Execution.ProjectItemInstance.Microsoft.Build.Evaluation.IItem.GetMetadataValueEscaped(string name) Line 448	C#
 	Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.MetadataMatchEvaluator.GetMetadataValueFromMatch(System.Text.RegularExpressions.Match match) Line 3082	C#
 	System.Text.RegularExpressions.dll!System.Text.RegularExpressions.Regex.Replace.AnonymousMethod__99_0(ref (System.Text.StructListBuilder<System.ReadOnlyMemory<char>> segments, System.Text.RegularExpressions.MatchEvaluator evaluator, int prevat, string input, int count) state, System.Text.RegularExpressions.Match match)	Unknown
 	System.Text.RegularExpressions.dll!System.Text.RegularExpressions.Regex.RunAllMatchesWithCallback<(System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int)>(string inputString, System.ReadOnlySpan<char> inputSpan, int startat, ref (System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int) state, System.Text.RegularExpressions.MatchCallback<(System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int)> callback, System.Text.RegularExpressions.RegexRunnerMode mode, bool reuseMatchObject)	Unknown
 	System.Text.RegularExpressions.dll!System.Text.RegularExpressions.Regex.RunAllMatchesWithCallback<(System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int)>(string input, int startat, ref (System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int) state, System.Text.RegularExpressions.MatchCallback<(System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int)> callback, System.Text.RegularExpressions.RegexRunnerMode mode, bool reuseMatchObject)	Unknown
 	System.Text.RegularExpressions.dll!System.Text.RegularExpressions.Regex.Replace(System.Text.RegularExpressions.MatchEvaluator evaluator, System.Text.RegularExpressions.Regex regex, string input, int count, int startat)	Unknown
 	System.Text.RegularExpressions.dll!System.Text.RegularExpressions.Regex.Replace(string input, System.Text.RegularExpressions.MatchEvaluator evaluator)	Unknown
 	Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.IntrinsicItemFunctions<Microsoft.Build.Execution.ProjectItemInstance>.ExpandQuotedExpressionFunction(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, Microsoft.Build.Shared.IElementLocation elementLocation, bool includeNullEntries, string functionName, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, Microsoft.Build.Execution.ProjectItemInstance>> itemsOfType, string[] arguments) Line 2732	C#
 	Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.Transform<Microsoft.Build.Execution.ProjectItemInstance>(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, bool includeNullEntries, System.Collections.Generic.Stack<Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.TransformFunction<Microsoft.Build.Execution.ProjectItemInstance>> transformFunctionStack, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, Microsoft.Build.Execution.ProjectItemInstance>> itemsOfType) Line 1812	C#
 	Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.Transform<Microsoft.Build.Execution.ProjectItemInstance>(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, bool includeNullEntries, System.Collections.Generic.Stack<Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.TransformFunction<Microsoft.Build.Execution.ProjectItemInstance>> transformFunctionStack, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, Microsoft.Build.Execution.ProjectItemInstance>> itemsOfType) Line 1803	C#
 	Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.ExpandExpressionCapture<Microsoft.Build.Execution.ProjectItemInstance>(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, Microsoft.Build.Evaluation.ExpressionShredder.ItemExpressionCapture expressionCapture, Microsoft.Build.Evaluation.IItemProvider<Microsoft.Build.Execution.ProjectItemInstance> evaluatedItems, Microsoft.Build.Shared.IElementLocation elementLocation, Microsoft.Build.Evaluation.ExpanderOptions options, bool includeNullEntries, out bool isTransformExpression, out System.Collections.Generic.List<System.Collections.Generic.KeyValuePair<string, Microsoft.Build.Execution.ProjectItemInstance>> itemsFromCapture) Line 2062	C#
 	Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.ExpandExpressionCaptureIntoStringBuilder<Microsoft.Build.Execution.ProjectItemInstance>(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, Microsoft.Build.Evaluation.ExpressionShredder.ItemExpressionCapture capture, Microsoft.Build.Evaluation.IItemProvider<Microsoft.Build.Execution.ProjectItemInstance> evaluatedItems, Microsoft.Build.Shared.IElementLocation elementLocation, Microsoft.NET.StringTools.SpanBasedStringBuilder builder, Microsoft.Build.Evaluation.ExpanderOptions options) Line 2197	C#
 	Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.ExpandItemVectorsIntoString<Microsoft.Build.Execution.ProjectItemInstance>(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, string expression, Microsoft.Build.Evaluation.IItemProvider<Microsoft.Build.Execution.ProjectItemInstance> items, Microsoft.Build.Evaluation.ExpanderOptions options, Microsoft.Build.Shared.IElementLocation elementLocation) Line 2123	C#
 	Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ExpandIntoStringLeaveEscaped(string expression, Microsoft.Build.Evaluation.ExpanderOptions options, Microsoft.Build.Shared.IElementLocation elementLocation) Line 500	C#
 	Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ExpandIntoStringAndUnescape(string expression, Microsoft.Build.Evaluation.ExpanderOptions options, Microsoft.Build.Shared.IElementLocation elementLocation) Line 477	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskExecutionHost.InitializeTaskScalarParameter(Microsoft.Build.Framework.TaskPropertyInfo parameter, System.Type parameterType, string parameterValue, Microsoft.Build.Construction.ElementLocation parameterLocation, out bool taskParameterSet) Line 1187	C#

So intrinsic-item-function metadata expansion calls GetMetadataValue on the item, which calls

https://github.com/dotnet/msbuild/blob/b62685dbb1f119aeaa76fab4432a4d90f40b85f5/src/Build/Definition/BuiltInMetadata.cs#L81

so "given a file and the wildcard-laden expression that produced it, break out the glob parts". Not too unreasonable though it'd be nice if we could retroactively know this when expanding the glob and just save all the bits and bobs.

This then creates a string that looks something like

^S:[/\\]+msbuild[/\\]+(?<WILDCARDDIR>((.*/)|(.*\\)|()))(?<FILENAME>[^/\\]*)$

and constructs the regex.

MSBuildGlob solves my unlimited-growth problems with a WeakDictionary cache:

https://github.com/dotnet/msbuild/blob/d0d0b2b62116d489b395be617f38ae9c01ad8594/src/Build/Globbing/MSBuildGlob.cs#L203-L226

(at the cost of rebuilding the regex after GCs a bunch, probably).

Honestly I'm torn between "I bet a WeakDictionary fixes all the real problems" and "use this as an excuse to stop misusing Regex here!"

rainersigwald avatar Jun 02 '25 20:06 rainersigwald

I'd be curious to see the size of the cache after Orchard build

KirillOsenkov avatar Jun 03 '25 00:06 KirillOsenkov

Orchard build, single process: 46 for an incremental build 40 for Clean 46 for full build (in retrospect I should've expected that)

SimaTian avatar Jun 10 '25 12:06 SimaTian

And I commented there and only then read Rainer's comment about the Glob cache :-)

SimaTian avatar Jun 10 '25 12:06 SimaTian

Thanks for getting this data!

KirillOsenkov avatar Jun 10 '25 22:06 KirillOsenkov