msbuild icon indicating copy to clipboard operation
msbuild copied to clipboard

Contention of Project Evaluation in parallel builds

Open yuehuang010 opened this issue 2 years ago • 7 comments

Issue Description

Project Evaluation in parallel builds have contention causing evaluations of 20-30ms to take over 1000ms.

Steps to Reproduce

Create a solution with lots of small projects, enough to saturate your CPU. I used 4 times CPU threads worth of projects. The contents of each projects is not relevant as I used "Clean" target to do the least amount work. I used nearly identical projects to remove variables. Projects don't have P2P to maximize throughput. Nodereuse:false in all cases.

Case 1: msbuild /t:clean /bl /v:q

Case 2: msbuild /t:clean /bl /v:q /m

Used binlog to record results and set verbose to quiet to avoid console print out noise. Observe the Project Evaluation times of all projects.

Data & Analysis

This image is the trace of a single node build (case 1). Observer that each evaluation time took a few 20-30ms, except for the initial project. image

This image is the trace of a multi node build (case 2) Observer that first evaluation took the same time in case 1, once parallel nodes started, the time of first evaluation takes seconds. Following subsequent project, their evaluation are faster. Notice node 1 is also having slowdown. image

Theory

I theorize there is single threaded file cache service that handles file IO. The file cache probably serializes the data back to the nodes while holding onto the lock, thus blocking other nodes from using it. Node 0 is affected by the contention, so that disproves the "new" node cost. Alternative is an evaluation cache where the lock is on the entire evaluation duration.

yuehuang010 avatar May 14 '22 00:05 yuehuang010

Thanks to "msbuild /profileEvaluation", I got a few more hints.

image

$([Microsoft.Build.Utilities.ToolLocationHelper]::GetLatestSDKTargetPlatformVersion($(SDKIdentifier), $(SDKVersion))) takes 3-4ms warm and 180ms cold. While the results are cached, there is a lock in RetrieveTargetPlatformList(). Same thing with ToolLocationHelper::GetPlatformSDKLocation() as it calls RetrieveTargetPlatformList().

There is also a few instance of "exists" conditions that takes 4-6ms. Hopefully those results are cached.

yuehuang010 avatar Jul 21 '22 02:07 yuehuang010

Disclaimer: not a maintainer, but afaik the CachingFileSystemWrapper is used for Exists evaluation.

Therzok avatar Jul 21 '22 04:07 Therzok

Just thinking out loud, if the main MSBuild node could copy over its caches to the child nodes, then that would save load time.

yuehuang010 avatar Jul 21 '22 23:07 yuehuang010

@yuehuang010 Is this still active? How serious you think it is? What priority you would give it?

rokonec avatar Jan 10 '23 13:01 rokonec

This is important if MSBuild wants to be used as a project system and integrated into an IDE. The perf makes it hard to scale up with many cores and large solutions.

On Tue, Jan 10, 2023 at 9:51 PM Roman Konecny @.***> wrote:

@yuehuang010 https://github.com/yuehuang010 Is this still active? How serious you think it is? What priority you would give it?

— Reply to this email directly, view it on GitHub https://github.com/dotnet/msbuild/issues/7625#issuecomment-1377316310, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEXI5GOKAGR5DQ4ESH3CP7TWRVSM7ANCNFSM5V42NAUQ . You are receiving this because you were mentioned.Message ID: @.***>

yuehuang010 avatar Jan 10 '23 15:01 yuehuang010

Just thinking out loud, if the main MSBuild node could copy over its caches to the child nodes

Off topic but is there still discussion of the possibility of moving some nodes into the same process, where tasks were known to not assume their own current directory and environment block? Although, without more rearchitecture there would still be serialization costs, there would be other savings.

danmoseley avatar Jan 10 '23 15:01 danmoseley

Without going too crazy, I think focusing on a simple problem of GetLatestSDKTargetPlatformVersion() is good enough. Only have the initial node hold on to ToolLocationHelper data and other nodes just request them.

On the hand, I hear that the multi threaded MSBuild is making progress, perhaps that is good enough.

yuehuang010 avatar Jan 19 '23 23:01 yuehuang010