msbuild
msbuild copied to clipboard
Contention of Project Evaluation in parallel builds
Issue Description
Project Evaluation in parallel builds have contention causing evaluations of 20-30ms to take over 1000ms.
Steps to Reproduce
Create a solution with lots of small projects, enough to saturate your CPU. I used 4 times CPU threads worth of projects. The contents of each projects is not relevant as I used "Clean" target to do the least amount work. I used nearly identical projects to remove variables. Projects don't have P2P to maximize throughput. Nodereuse:false in all cases.
Case 1: msbuild /t:clean /bl /v:q
Case 2: msbuild /t:clean /bl /v:q /m
Used binlog to record results and set verbose to quiet to avoid console print out noise. Observe the Project Evaluation times of all projects.
Data & Analysis
This image is the trace of a single node build (case 1). Observer that each evaluation time took a few 20-30ms, except for the initial project.
This image is the trace of a multi node build (case 2) Observer that first evaluation took the same time in case 1, once parallel nodes started, the time of first evaluation takes seconds. Following subsequent project, their evaluation are faster. Notice node 1 is also having slowdown.
Theory
I theorize there is single threaded file cache service that handles file IO. The file cache probably serializes the data back to the nodes while holding onto the lock, thus blocking other nodes from using it. Node 0 is affected by the contention, so that disproves the "new" node cost. Alternative is an evaluation cache where the lock is on the entire evaluation duration.
Thanks to "msbuild /profileEvaluation", I got a few more hints.
$([Microsoft.Build.Utilities.ToolLocationHelper]::GetLatestSDKTargetPlatformVersion($(SDKIdentifier), $(SDKVersion))) takes 3-4ms warm and 180ms cold. While the results are cached, there is a lock in RetrieveTargetPlatformList(). Same thing with ToolLocationHelper::GetPlatformSDKLocation() as it calls RetrieveTargetPlatformList().
There is also a few instance of "exists" conditions that takes 4-6ms. Hopefully those results are cached.
Disclaimer: not a maintainer, but afaik the CachingFileSystemWrapper is used for Exists evaluation.
Just thinking out loud, if the main MSBuild node could copy over its caches to the child nodes, then that would save load time.
@yuehuang010 Is this still active? How serious you think it is? What priority you would give it?
This is important if MSBuild wants to be used as a project system and integrated into an IDE. The perf makes it hard to scale up with many cores and large solutions.
On Tue, Jan 10, 2023 at 9:51 PM Roman Konecny @.***> wrote:
@yuehuang010 https://github.com/yuehuang010 Is this still active? How serious you think it is? What priority you would give it?
— Reply to this email directly, view it on GitHub https://github.com/dotnet/msbuild/issues/7625#issuecomment-1377316310, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEXI5GOKAGR5DQ4ESH3CP7TWRVSM7ANCNFSM5V42NAUQ . You are receiving this because you were mentioned.Message ID: @.***>
Just thinking out loud, if the main MSBuild node could copy over its caches to the child nodes
Off topic but is there still discussion of the possibility of moving some nodes into the same process, where tasks were known to not assume their own current directory and environment block? Although, without more rearchitecture there would still be serialization costs, there would be other savings.
Without going too crazy, I think focusing on a simple problem of GetLatestSDKTargetPlatformVersion() is good enough. Only have the initial node hold on to ToolLocationHelper data and other nodes just request them.
On the hand, I hear that the multi threaded MSBuild is making progress, perhaps that is good enough.