msbuild icon indicating copy to clipboard operation
msbuild copied to clipboard

Make a way for Target authors to easily clean up unused/intermediate Items after Target invocation

Open baronfel opened this issue 3 weeks ago • 6 comments

Summary

Today it's common for authors of MSBuild Target logic to create Items in Targets for many reasons:

  • need to store per-item metadata/computed values
  • incrementally build up to a solution
  • track specific steps through a complex filtering/sorting workflow

An example of this is the following code from MSBuildLocator:

  <Target Name="EnsureMSBuildAssembliesNotCopied" AfterTargets="ResolvePackageAssets" Condition="'$(DisableMSBuildAssemblyCopyCheck)' != 'true'">
    <ItemGroup>
      <_MSBuildAssembliesCopyLocalItems Include="@(RuntimeCopyLocalItems->WithMetadataValue('NuGetPackageId', 'Microsoft.Build'))" />
      <_MSBuildAssembliesCopyLocalItems Include="@(RuntimeCopyLocalItems->WithMetadataValue('NuGetPackageId', 'Microsoft.Build.Framework'))" />
      <_MSBuildAssembliesCopyLocalItems Include="@(RuntimeCopyLocalItems->WithMetadataValue('NuGetPackageId', 'Microsoft.Build.Utilities.Core'))" />
      <_MSBuildAssembliesCopyLocalItems Include="@(RuntimeCopyLocalItems->WithMetadataValue('NuGetPackageId', 'Microsoft.Build.Tasks.Core'))" />
      <_MSBuildAssembliesCopyLocalItems Include="@(RuntimeCopyLocalItems->WithMetadataValue('NuGetPackageId', 'Microsoft.Build.Engine'))" />
      <_MSBuildAssembliesCopyLocalItems Include="@(RuntimeCopyLocalItems->WithMetadataValue('NuGetPackageId', 'Microsoft.Build.Conversion.Core'))" />
      <_MSBuildAssembliesCopyLocalItems Include="@(RuntimeCopyLocalItems->WithMetadataValue('NuGetPackageId', 'Microsoft.Build.Runtime'))" />
      <_MSBuildAssembliesCopyLocalItems Include="@(RuntimeCopyLocalItems->WithMetadataValue('NuGetPackageId', 'Microsoft.Build.Localization'))" />
      <_MSBuildAssembliesCopyLocalItems Include="@(RuntimeCopyLocalItems->WithMetadataValue('NuGetPackageId', 'Microsoft.Build.Engine'))" />
      <_MSBuildAssembliesCopyLocalItems Include="@(RuntimeCopyLocalItems->WithMetadataValue('NuGetPackageId', 'Microsoft.NET.StringTools'))" />
      <_MSBuildAssembliesCopyLocalItems Include="@(RuntimeCopyLocalItems->WithMetadataValue('NuGetPackageId', 'NuGet.Frameworks'))" />
      <MSBuildAssembliesCopyLocalItems Include="@(_MSBuildAssembliesCopyLocalItems->WithMetadataValue('CopyLocal', 'true')->WithMetadataValue('AssetType', 'runtime'))" />
      <_DistinctMSBuildPackagesReferencedPoorly Include="@(MSBuildAssembliesCopyLocalItems->Metadata('NuGetPackageId')->Distinct())" />
    </ItemGroup>

    <Error
      Condition="@(_DistinctMSBuildPackagesReferencedPoorly->Count()) > 0"
      Code="MSBL001"
      Text="A PackageReference to the package '%(_DistinctMSBuildPackagesReferencedPoorly.NuGetPackageId)' at version '%(_DistinctMSBuildPackagesReferencedPoorly.NuGetPackageVersion)' is present in this project without ExcludeAssets=&quot;runtime&quot; and PrivateAssets=&quot;all&quot; set. This can cause errors at run-time due to MSBuild assembly-loading."
      HelpLink="https://aka.ms/msbuild/locator/diagnostics/MSBL001"
      />
  </Target>

Where a list of candidates is built up over time in progressive steps, then used to compute a list of PackageReferences that violate the MSBuild Locator packaging rules. In projects with many references this intermediate list can represent hundreds of Items.

These Items often represent discrete steps towards a final goal, but one that goal is accomplished they remain in the engine because there's no clear indicator in the engine if something is actually consuming those items or not.

Today, users must remember to manually clean up these lists (which we haven't done at time of writing in Locator!).

While we can add tooling that warns users when this happens, we should have a more first-class way to annotate that items can be automatically cleaned up by the engine after Target invocation.

Background and Motivation

  • Improved memory usage of the build
  • Reduced developer cognitive load

Proposed Feature

There are a few ways of approaching this, I'll highlight a few in brief:

  • The engine could track usage itself and have some logic for when to clean up Items
    • Difficult to know for certain because Inputs and Outputs of Targets aren't comprehensive today - could lead to premature deletions
  • The individual Items could support a metadata that signaled that they could be deleted after Target Invocation
    • <_MyThing Include="..." AutoRemove="true" />
    • Annoying because it's a bit manual, it makes the 'easy' thing the non-performant thing
    • Metadata name is subject to collisions - if we choose a common name we alter semantics for existing code, if we choose an ugly-but-unique name it's hard for users to remember and use
  • The Target could support declaring names of Item lists to remove after execution
    • <Target Name="DoStuff" Inputs="@(Things);@(Stuff)" Outputs="@(MyThings)" Locals="@(_MyIntermediates)">...</Target>
    • semi-declarative, shows intent, relatively easily tool-able?

Alternative Designs

No response

baronfel avatar Dec 08 '25 14:12 baronfel

See also the OP in dotnet/msbuild#2480

rainersigwald avatar Dec 08 '25 16:12 rainersigwald

See also the OP in dotnet/msbuild#2480

Would be interesting to investigate if there is some possibility of infering this scope, and thus simply being able to release the memory when they're not used.

wasabii avatar Dec 09 '25 14:12 wasabii

@wasabii I think that long-term that will need to be the solution (and future inference/tracking to make Target-invocation parallelism will likely require this), but I want to start with something easy to implement to get some kind of cleanup mechanism in, and then future enhancements could then automate that existing cleanup. Does that make sense? Kind of a walk/crawl/run progression - first detect the problem and let the user manually fix it (buildcheck), then make a language/engine feature to opt into declarative clean up (this issue), then automate that detection + cleanup (the glorious future).

baronfel avatar Dec 09 '25 15:12 baronfel

This reminds me about the GNU C preprocessor feature that detects header guards #ifndef/#define/#endif and then skips every subsequent #include of the same file if the guard macro has already been defined. So I wonder if there is any optimisation opportunity for MSBuild detecting that the target is going to clear the item type at the end, and then allocating the items as target-local in the first place.

KalleOlaviNiemitalo avatar Dec 09 '25 16:12 KalleOlaviNiemitalo

@wasabii I think that long-term that will need to be the solution (and future inference/tracking to make Target-invocation parallelism will likely require this), but I want to start with something easy to implement to get some kind of cleanup mechanism in, and then future enhancements could then automate that existing cleanup. Does that make sense? Kind of a walk/crawl/run progression - first detect the problem and let the user manually fix it (buildcheck), then make a language/engine feature to opt into declarative clean up (this issue), then automate that detection + cleanup (the glorious future).

I think I'd argue against this, because introducing an explicit scope is something you commit users to do, and can never undo the consequences of. 10 years from now you'll have scopes set all over the place in the wild which don't actually do anything. But you'll need to maintain support for it forever.

I'd just tackle the automatic stuff first, focusing on the most obvious cases. Where the usage of an ItemGroup or PropertyGroup can be clearly inferred from it's usage. And then add more advanced cases over time.

But that's just me. My opinions on MSBuild are different. ;)

wasabii avatar Dec 09 '25 16:12 wasabii

I'm pretty pessimistic about the idea of being able to GC properties/items inside MSBuild. The fundamental problem is that they are stored in project-level mutable state, and can be accessed after a target's invocation, in another target or in the OM after build operations. Of course lots of items aren't accessed--but how do we know? We'd have to have static-analysis understanding of every target to see if any of them could reference a specific item, and I don't think there's any way to know "no one will ever call this target then inspect project state for this item name".

rainersigwald avatar Dec 09 '25 17:12 rainersigwald