msbuild
msbuild copied to clipboard
GetFileSpecInfoWithRegexObject allocates Regex
I noticed this allocates too much:
https://github.com/dotnet/msbuild/blob/4b2d01ad2b0ef3ee4a05ed29ff1ec2b558b51f17/src/Shared/FileMatcher.cs#L1498
Let's investigate if we can cache these regexes, I suspect there's a very limited number of expressions here that we can reuse regexes for
This simple change:
private ConcurrentDictionary<string, (Regex regex, bool needsRecursion, bool isLegalFileSpec)> regexCache =
new(StringComparer.Ordinal);
internal void GetFileSpecInfoWithRegexObject(
string filespec,
out Regex regexFileMatch,
out bool needsRecursion,
out bool isLegalFileSpec)
{
var result = regexCache.GetOrAdd(filespec, spec =>
{
GetFileSpecInfoWithRegexObjectCore(spec, out var regex, out var needsRec, out var isLegal);
return (regex, needsRec, isLegal);
});
regexFileMatch = result.regex;
needsRecursion = result.needsRecursion;
isLegalFileSpec = result.isLegalFileSpec;
}
Shaves off 0.6 seconds from our incremental build, 13.9s -> 13.3s:
Original:
13.707
13.717
14.019
13.601
14.755
Average: 13.960 s
After the fix:
13.473
13.242
13.598
13.057
13.361
Average: 13.346 s
it creates a regex for every metadata for every globbed file, resulting in >20,000 identical regexes in our relatively simple build.
The PR is out, at least the first iteration.
it creates a regex for every metadata for every globbed file
That sounds bad, what's the callstack here? Do we actually need different things for different metadata? For that matter do we need different things for different globs?
@SimaTian I'd like to understand a bit more before we go with the cache.
Here's a repro:
<Project>
<Target Name="Build">
<PropertyGroup>
<SourceDir>C:\temp\largedir</SourceDir>
<DestinationDir>C:\temp\2</DestinationDir>
</PropertyGroup>
<ItemGroup>
<File Include="$(SourceDir)\**\*" />
</ItemGroup>
<Copy SourceFiles="@(File)" DestinationFiles="@(File->'$(DestinationDir)\%(RecursiveDir)%(Filename)%(Extension)')" SkipUnchangedFiles="True" />
</Target>
</Project>
Place a breakpoint in GetFileSpecInfoWithRegexObject(), ensure the C:\temp\largedir has more than one file, the method will be hit for every file with the same filespec parameter value.
Bonus if you can compare the perf and memory usage of the fix on a very large directory before and after the fix.
Nice self-contained repro. So the calls are:
> Microsoft.Build.dll!Microsoft.Build.Shared.FileMatcher.GetFileSpecInfoWithRegexObject(string filespec, out System.Text.RegularExpressions.Regex regexFileMatch, out bool needsRecursion, out bool isLegalFileSpec) Line 1494 C#
Microsoft.Build.dll!Microsoft.Build.Shared.FileMatcher.FileMatch(string filespec, string fileToMatch) Line 1827 C#
Microsoft.Build.dll!Microsoft.Build.Evaluation.BuiltInMetadata.GetRecursiveDirValue(string evaluatedIncludeBeforeWildcardExpansionEscaped, string evaluatedIncludeEscaped) Line 110 C#
Microsoft.Build.dll!Microsoft.Build.Evaluation.BuiltInMetadata.GetMetadataValueEscaped(string currentDirectory, string evaluatedIncludeBeforeWildcardExpansionEscaped, string evaluatedIncludeEscaped, string definingProjectEscaped, string name, ref string fullPath) Line 81 C#
Microsoft.Build.dll!Microsoft.Build.Execution.ProjectItemInstance.TaskItem.GetBuiltInMetadataEscaped(string name) Line 1938 C#
Microsoft.Build.dll!Microsoft.Build.Execution.ProjectItemInstance.TaskItem.GetMetadataEscaped(string metadataName) Line 1353 C#
Microsoft.Build.dll!Microsoft.Build.Execution.ProjectItemInstance.Microsoft.Build.Evaluation.IItem.GetMetadataValueEscaped(string name) Line 448 C#
Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.MetadataMatchEvaluator.GetMetadataValueFromMatch(System.Text.RegularExpressions.Match match) Line 3082 C#
System.Text.RegularExpressions.dll!System.Text.RegularExpressions.Regex.Replace.AnonymousMethod__99_0(ref (System.Text.StructListBuilder<System.ReadOnlyMemory<char>> segments, System.Text.RegularExpressions.MatchEvaluator evaluator, int prevat, string input, int count) state, System.Text.RegularExpressions.Match match) Unknown
System.Text.RegularExpressions.dll!System.Text.RegularExpressions.Regex.RunAllMatchesWithCallback<(System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int)>(string inputString, System.ReadOnlySpan<char> inputSpan, int startat, ref (System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int) state, System.Text.RegularExpressions.MatchCallback<(System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int)> callback, System.Text.RegularExpressions.RegexRunnerMode mode, bool reuseMatchObject) Unknown
System.Text.RegularExpressions.dll!System.Text.RegularExpressions.Regex.RunAllMatchesWithCallback<(System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int)>(string input, int startat, ref (System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int) state, System.Text.RegularExpressions.MatchCallback<(System.Text.StructListBuilder<System.ReadOnlyMemory<char>>, System.Text.RegularExpressions.MatchEvaluator, int, string, int)> callback, System.Text.RegularExpressions.RegexRunnerMode mode, bool reuseMatchObject) Unknown
System.Text.RegularExpressions.dll!System.Text.RegularExpressions.Regex.Replace(System.Text.RegularExpressions.MatchEvaluator evaluator, System.Text.RegularExpressions.Regex regex, string input, int count, int startat) Unknown
System.Text.RegularExpressions.dll!System.Text.RegularExpressions.Regex.Replace(string input, System.Text.RegularExpressions.MatchEvaluator evaluator) Unknown
Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.IntrinsicItemFunctions<Microsoft.Build.Execution.ProjectItemInstance>.ExpandQuotedExpressionFunction(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, Microsoft.Build.Shared.IElementLocation elementLocation, bool includeNullEntries, string functionName, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, Microsoft.Build.Execution.ProjectItemInstance>> itemsOfType, string[] arguments) Line 2732 C#
Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.Transform<Microsoft.Build.Execution.ProjectItemInstance>(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, bool includeNullEntries, System.Collections.Generic.Stack<Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.TransformFunction<Microsoft.Build.Execution.ProjectItemInstance>> transformFunctionStack, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, Microsoft.Build.Execution.ProjectItemInstance>> itemsOfType) Line 1812 C#
Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.Transform<Microsoft.Build.Execution.ProjectItemInstance>(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, bool includeNullEntries, System.Collections.Generic.Stack<Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.TransformFunction<Microsoft.Build.Execution.ProjectItemInstance>> transformFunctionStack, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, Microsoft.Build.Execution.ProjectItemInstance>> itemsOfType) Line 1803 C#
Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.ExpandExpressionCapture<Microsoft.Build.Execution.ProjectItemInstance>(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, Microsoft.Build.Evaluation.ExpressionShredder.ItemExpressionCapture expressionCapture, Microsoft.Build.Evaluation.IItemProvider<Microsoft.Build.Execution.ProjectItemInstance> evaluatedItems, Microsoft.Build.Shared.IElementLocation elementLocation, Microsoft.Build.Evaluation.ExpanderOptions options, bool includeNullEntries, out bool isTransformExpression, out System.Collections.Generic.List<System.Collections.Generic.KeyValuePair<string, Microsoft.Build.Execution.ProjectItemInstance>> itemsFromCapture) Line 2062 C#
Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.ExpandExpressionCaptureIntoStringBuilder<Microsoft.Build.Execution.ProjectItemInstance>(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, Microsoft.Build.Evaluation.ExpressionShredder.ItemExpressionCapture capture, Microsoft.Build.Evaluation.IItemProvider<Microsoft.Build.Execution.ProjectItemInstance> evaluatedItems, Microsoft.Build.Shared.IElementLocation elementLocation, Microsoft.NET.StringTools.SpanBasedStringBuilder builder, Microsoft.Build.Evaluation.ExpanderOptions options) Line 2197 C#
Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ItemExpander.ExpandItemVectorsIntoString<Microsoft.Build.Execution.ProjectItemInstance>(Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance> expander, string expression, Microsoft.Build.Evaluation.IItemProvider<Microsoft.Build.Execution.ProjectItemInstance> items, Microsoft.Build.Evaluation.ExpanderOptions options, Microsoft.Build.Shared.IElementLocation elementLocation) Line 2123 C#
Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ExpandIntoStringLeaveEscaped(string expression, Microsoft.Build.Evaluation.ExpanderOptions options, Microsoft.Build.Shared.IElementLocation elementLocation) Line 500 C#
Microsoft.Build.dll!Microsoft.Build.Evaluation.Expander<Microsoft.Build.Execution.ProjectPropertyInstance, Microsoft.Build.Execution.ProjectItemInstance>.ExpandIntoStringAndUnescape(string expression, Microsoft.Build.Evaluation.ExpanderOptions options, Microsoft.Build.Shared.IElementLocation elementLocation) Line 477 C#
Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskExecutionHost.InitializeTaskScalarParameter(Microsoft.Build.Framework.TaskPropertyInfo parameter, System.Type parameterType, string parameterValue, Microsoft.Build.Construction.ElementLocation parameterLocation, out bool taskParameterSet) Line 1187 C#
So intrinsic-item-function metadata expansion calls GetMetadataValue on the item, which calls
https://github.com/dotnet/msbuild/blob/b62685dbb1f119aeaa76fab4432a4d90f40b85f5/src/Build/Definition/BuiltInMetadata.cs#L81
so "given a file and the wildcard-laden expression that produced it, break out the glob parts". Not too unreasonable though it'd be nice if we could retroactively know this when expanding the glob and just save all the bits and bobs.
This then creates a string that looks something like
^S:[/\\]+msbuild[/\\]+(?<WILDCARDDIR>((.*/)|(.*\\)|()))(?<FILENAME>[^/\\]*)$
and constructs the regex.
MSBuildGlob solves my unlimited-growth problems with a WeakDictionary cache:
https://github.com/dotnet/msbuild/blob/d0d0b2b62116d489b395be617f38ae9c01ad8594/src/Build/Globbing/MSBuildGlob.cs#L203-L226
(at the cost of rebuilding the regex after GCs a bunch, probably).
Honestly I'm torn between "I bet a WeakDictionary fixes all the real problems" and "use this as an excuse to stop misusing Regex here!"
I'd be curious to see the size of the cache after Orchard build
Orchard build, single process: 46 for an incremental build 40 for Clean 46 for full build (in retrospect I should've expected that)
And I commented there and only then read Rainer's comment about the Glob cache :-)
Thanks for getting this data!