msbuild icon indicating copy to clipboard operation
msbuild copied to clipboard

Perf: Fast quoted expression expansion

Open ccastanedaucf opened this issue 5 months ago • 0 comments
trafficstars

Fixes

Fairly involved change that trivializes this call path (to the point where retrieving the metadata costs more than the regex!)

image

image

image

image

Context

This bit of code is responsible for nearly all the allocations and runtime here:

matchEvaluator = new MetadataMatchEvaluator(item.Key, item.Value, elementLocation);
include = RegularExpressions.ItemMetadataRegex.Replace(arguments[0], matchEvaluator.GetMetadataValueFromMatch);

Replace takes a user-provided method, which is given each individual Match and spits out a string to concat as the result.

As you see below though, the vast majority of allocations aren't even string-related, but rather the Groups-related objects and internal arrays that Match creates the first time you access the Groups property.

// Match.cs
public virtual GroupCollection Groups => _groupcoll ??= new GroupCollection(this, null);

// GroupCollection.cs
private readonly Match _match;
private readonly Hashtable? _captureMap;

/// <summary>Cache of Group objects fed to the user.</summary>
private Group[]? _groups;

private Group GetGroupImpl(int groupnum)
{
    if (groupnum == 0)
    {
        return _match;
    }

    // Construct all the Group objects the first time GetGroup is called
    if (_groups is null)
    {
        _groups = new Group[_match._matchcount.Length - 1];
        for (int i = 0; i < _groups.Length; i++)
        {
            string groupname = _match._regex!.GroupNameFromNumber(i + 1);
            _groups[i] = new Group(_match.Text, _match._matches[i + 1], _match._matchcount[i + 1], groupname);
        }
    }

    return _groups[groupnum - 1];
}

This is basically impossible to avoid since neither .NET Framework or .NET Core provide an allocation-free way to access a single group that you're interested in, whether by name or ID, or reuse the related objects. The closest thing is a discussion to potentially introduce a ValueGroup object similar to ValueMatch that can be enumerated, but that appears to be a ways off.

This means in order to solve this, we have to find ways to avoid entering the Regex path entirely.   Taken from ExpandQuotedExpressionFunction on main:

image

Taken from MetadataMatchEvaluator on main:

image

Changes Made

Besides the core part of avoiding Regex.Replace() and manually iterating the match object, there are a few observations to make here:

  1. The vast majority of strings here are an exact match of %(ItemSpecModifier) - therefore we can avoid processing these at all by just doing a simple dictionary lookup.
private static readonly FrozenDictionary<string, string> s_itemSpecModifiers = new Dictionary<string, string>()
{
    [$"%{{{ItemSpecModifiers.FullPath}}}"] = ItemSpecModifiers.FullPath,
    [$"%{{{ItemSpecModifiers.RootDir}}}"] = ItemSpecModifiers.RootDir,
    /// ... ect.
}
  1. Many strings passed through here are identical to the previous run - therefore a simple single reference cache avoids another large set of lookups. This is "thread-safe" since 8-byte reads/writes are atomic + we always work on a local reference and just overwrite.
private static string s_lastParsedQuotedExpression;
  1. Many two-match cases are also just a pair of itemspec modifiers - therefore we can avoid accessing the expensive GroupCollection allocation by performing a lookup on each match iteration as well.

  2. Most matches are the single-match case. We can avoid the vast majority of collection / array allocations by wrapping the result in a discriminated union of "one-or-more" matches. In the full match case, we can additional avoid any string allocations as well.

private enum MetadataMatchType
{
    ExactString,
    Single,
    Multiple,
}

readonly struct OneOrMultipleMetadataMatches
{
    internal MetadataMatch Single { get; }

    internal List<MetadataMatch> Multiple { get; }

    internal MetadataMatchType Type { get; }
}

ccastanedaucf avatar Jun 12 '25 21:06 ccastanedaucf