msbuild icon indicating copy to clipboard operation
msbuild copied to clipboard

Look into merging MSBuild's RAR caching and the SDK's assembly (?) caching

Open Forgind opened this issue 2 years ago • 7 comments

Look into what they have in common. Merge the caching to some extent. Possibly merge the code paths? Possibly only run it once and share the results?

Specifically ProcessFrameworkReferences and ResolvePackageAssets.

Forgind avatar Dec 06 '21 17:12 Forgind

Possibly only run it once and share the results?

This one could have most potential.

I'm experiencing a similar design in my prototype BenchBuild of a fast up-to-date build server with msbuild, and I'm for example generating cache files for all dependent assembly references (including analyzers):

   163 dll-Microsoft.CodeAnalysis.CSharp.NetAnalyzers.dll-43cd394337288a248c9bb1478bfa61aa.cache
   156 dll-Microsoft.CodeAnalysis.NetAnalyzers.dll-964681bd507788578febb23a827afebf.cache       
18,232 fwk-Microsoft.NETCore.App-6.0.0-e9432bea5bfe249209d36d6163b16e05.cache                   

All these files are unique and shared by all projects in a solution and they represent all the input assembly dependencies of all projects (the benchmark projects I setup doesn't have any package yet, so they only depend on framework)

Then I store the results in a file $ProjectName.csproj.BuildResult.cache. This file has dependencies on the files above (relying on my static-graph PR #7121)

image

In this solution, there are 100 projects. The state file generated by RAR for one project (the red part LibChild1_0.csproj.AssemblyReference.cache in the picture) takes 85KB. Multiply by 100 and msbuild is basically loading/saving 8.5MB just for assembly references, while all these references are the same in my projects. While with a shared approach, it could basically reduce this data to less than 100KB for all the projects.

One challenge is that regular msbuild projects don't have a place for a shared build folder (e.g like the top level folder .vs used by Visual Studio for caching).

xoofx avatar Dec 09 '21 21:12 xoofx

Consider a cheap in-proc optimization:

once we read the info of a .dll on disk, cache it using the full path + timestamp as the key (in a static dictionary). Next time we need to read .dll info, check the file path and timestamp and if we've read it before, just return the info.

This will only cache per node, and will invalidate if the timestamp changes, but seems like very easy to implement and gets us significant wins so we don't need to read metadata for the same .dll repeatedly.

KirillOsenkov avatar May 13 '22 21:05 KirillOsenkov

Need this for RAR as well as ResolvePackageFileConflicts as well as others potentially. ResolvePackageFileConflicts is slow and not incremental.

KirillOsenkov avatar May 13 '22 21:05 KirillOsenkov

The SDK's assembly resolution is presumably done in dotnet.dll, whereas RAR is in msbuild.dll. Would they be able to communicate, or were you just thinking of speeding up dotnet.dll's tasks?

Forgind avatar May 13 '22 21:05 Forgind

Each can have their own independent cache, but we'd want to speed up both.

KirillOsenkov avatar May 14 '22 00:05 KirillOsenkov

If there are concerns about the DLL changing over time, the MVID can be read quickly from the first bytes of the DLL.

drewnoakes avatar May 14 '22 02:05 drewnoakes

I think a timestamp is good enough and won't require reading the file at all. If the timestamp has changed we can just bypass the cache and do what we'd normally do.

But yeah, I benchmarked reading MVID a while back: https://github.com/KirillOsenkov/MetadataTools/blob/main/src/PEFile/MvidBenchmark.cs

KirillOsenkov avatar May 14 '22 03:05 KirillOsenkov

We looked at this with a few people on the team and decided it isn't an avenue worth pursuing. Only one SDK task actually resolves similar data to RAR, and that one task only does so for a few assemblies per build, so merging the cache infrastructure would likely be more work than it's worth.

Forgind avatar Aug 23 '22 16:08 Forgind