templating icon indicating copy to clipboard operation
templating copied to clipboard

OSX build job timeouts on dotnet SDK preview 5

Open vlada-shubina opened this issue 2 years ago • 4 comments

We had to revert arcade PR that updated dotnet SDK to preview 5: https://github.com/dotnet/templating/pull/4940 due to OSX jobs were constantly failing by timeout. Investigate the reason and provide a fix/workaround.

vlada-shubina avatar Jul 12 '22 08:07 vlada-shubina

Some observations: there are "long running tests" in almost every job (even successful), this should not be the case. It was happening even before https://github.com/dotnet/templating/pull/4940 We don't have any test set that is expected to be run for more than 1 minute.

vlada-shubina avatar Jul 14 '22 08:07 vlada-shubina

After collecting logs of dotnet new integration tests on diffent OS + Build configuration in 4 CI runs, I found long running test just happened to the followong tests which all execute dotnet new command along with post action(restore project), and dotnet build command.

On all OS + Build configuration these tests run longer than longRunningTestSeconds configured in xunit.runner.json more or less. Especially on OSX for each input it took longer time to run.

IntegrationTests.CommonTemplatesTests

Test Count of Inputs Duration of each input(seconds)
Windows Linux OSX
AllCommonProjectsCreateRestoreAndBuild 20 5 ~ 10 6 ~ 9 7 ~ 11
FileScopedNamespacesSupport 80 4 ~ 5.7 4.8 ~ 5.7 6 ~ 7.5
ImplicitUsingsSupport 120 4.5 ~ 12 5 ~ 12 6.5 ~ 25
NullableSupport 120 4.5 ~ 5.5 4.5 ~ 5.5 5 ~ 7
TopLevelProgramSupport 40 4.5 ~ 5.7 5 ~ 6 6 ~ 7

For each input, I think the test duration is acceptable. But based on big amount of inputs the test easily runs long time.

GangWang01 avatar Jul 27 '22 12:07 GangWang01

OSX tests seems to be visibly slower than other platforms. This might be concerning. Can we do further investigations and collect the data for all steps:

  • dotnet new (w/o restore post action)
  • dotnet restore
  • dotnet build

Also for two cases:

  • dotnet SDK preview 4
  • dotnet SDK preview 5+

The slowness might be because of different image configuration though, but we saw visible degradation when dotnet SDK was updated to preview 5.

vlada-shubina avatar Jul 27 '22 15:07 vlada-shubina

The test/command execution time has no obvious increase from SDK Preview 4 to SDK Preview5.

Please see average time(second) of each step on different platforms. Note: dotnet new Invocation means the time spent on the command execution inside the template engine.

dotnet SDK Preview 5+

dotnet new dotnet new Invocation dotnet restore dotnet build
Linux 1.901 1.4845 1.0395 1.68
OSX 3.5655 2.6525 1.862 3.0815
Windows 2.668 2.1365 1.426 2.4715

dotnet SDK Preview 4+

dotnet new dotnet new Invocation dotnet restore dotnet build
Linux 1.7905 1.393 0.969 1.5985
OSX 3.3755 2.498 1.717 2.949
Windows 2.4365 1.942 1.333 2.2515

From the data above, the time of command execution on OSX is bigger than others, especially dotnet new command(for all the long running tests, they all have post-action restoring project). In addition, comparing the time of dotnet new command and its invocation, the executuion has longer latency waiting for the process of the command to finish.

I guess the OSX machine of the pipeline has some performance issue. If there is a Mac, I can try to collect the data and check if it's true.

GangWang01 avatar Aug 05 '22 12:08 GangWang01

Collect more data of the average execution time from multiple runs with dotnet 7 Preview 5 & Preview 6(macOS 11), and runs with macOS 12 and macOS 10.15(Preview 5).

Preview 5+ (macOS 11)

dotnet new dotnet new Invocation Post Action(restore project) dotnet restore dotnet build
Linux 2.064173 1.6137879 1.1281816 1.1283 1.9132817
OSX 3.493359 2.6076145 1.8707573 1.7644535 3.183685
Windows 2.483639 1.9859322 1.385201 1.3089821 2.408805

Preview 6+ (macOS 11)

dotnet new dotnet new Invocation Post Action(restore project) dotnet restore dotnet build
Linux 2.0932648 1.6323674 1.1433156 1.1404357 1.9470499
OSX 3.490867 2.603189 1.8777177 1.8183358 3.2083027
Windows 2.5734446 2.0557199 1.4389324 1.3509214 2.503212

macOS 12 (Preview 5+)

dotnet new dotnet new Invocation Post Action(restore project) dotnet restore dotnet build
Linux 2.191716 1.7414826 1.1195469 1.1304715 1.908879
OSX 4.0858107 3.107426 2.0789165 1.878 3.4767885
Windows 2.6624708 2.1495178 1.4171677 1.3428249 2.4505095

macOS 10.15 (Preview 5+)

dotnet new dotnet new Invocation Post Action(restore project) dotnet restore dotnet build
Linux 2.47348 1.9679581 1.2686051 1.2540607 2.1497252
OSX 2.869632 2.3418262 1.5360022 1.4697607 2.657915
Windows 2.3912294 1.9303422 1.2621934 1.2144928 2.2108123

From the data of macOS 10.15 compared to the data of macOS 11 and macOS 12, it's suspicious that allocated computing resources such CPU & memory size are not increased for macOS machine while higher version of macOS requires them for better performance.

GangWang01 avatar Aug 23 '22 07:08 GangWang01

@GangWang01 does dotnet new invocation include post action execution time?

vlada-shubina avatar Aug 23 '22 07:08 vlada-shubina

@GangWang01 does dotnet new invocation include post action execution time?

Yes, it does.

GangWang01 avatar Aug 23 '22 07:08 GangWang01

Let's do the following tests:

  • macOS 10.15, macOS 11, macOS 12, Windows, Linux - preview 4
  • macOS 10.15, macOS 11, macOS 12, Windows, Linux - preview 5

and capture the following data

  • the execution time of dotnet new process
  • the total time of dotnet new (inside the process)
  • the template instantiation time: https://github.com/dotnet/sdk/blob/054e72dd640a857adea9f0cab0b37dfee866b327/src/Cli/Microsoft.TemplateEngine.Cli/TemplateInvoker.cs#L163
  • post action execution time: https://github.com/dotnet/sdk/blob/054e72dd640a857adea9f0cab0b37dfee866b327/src/Cli/Microsoft.TemplateEngine.Cli/TemplateInvoker.cs#L218
  • the execution time of dotnet restore process
  • the execution time of dotnet build process

vlada-shubina avatar Aug 23 '22 08:08 vlada-shubina

Captured the execution time mentioned above.
dotnet 7.0.100-preview.4.22178.9

dotnet new process dotnet new execution template instantiation post action(restore project) dotnet restore process dotnet build process
Linux 2.2087092 1.763950867 0.436288517 1.126641267 1.097844967 2.0284816
Windows 2.463470733 1.998035267 0.5060258 1.2932189 1.24554 2.350077467
macOS 10.15 2.5888598 2.1052604 0.562038 1.302152 1.271835 2.4018087
macOS 11 3.084912 2.5056243 0.6749609 1.5507138 1.4974926 2.8483784
macOS 12 2.9101703 2.362975 0.6624255 1.4282473 1.3800725 2.6093628

dotnet 7.0.100-preview.5.22307.18

dotnet new process dotnet new execution template instantiation post action(restore project) dotnet restore process dotnet build process
Linux 2.394860267 1.906371733 0.46408956 1.2281999 1.2233042 2.0828921
Windows 2.621546267 2.115408267 0.52079316 1.3900868 1.317997767 2.416077367
macOS 10.15 2.8541722 2.32902 0.5608944 1.5239769 1.48888 2.6458135
macOS 11 3.8817263 2.94531 0.66346455 2.0003474 1.863845 3.3911843
macOS 12 3.8916516 2.9557867 0.69756556 1.975874 1.875765 3.345523

GangWang01 avatar Sep 02 '22 07:09 GangWang01

@GangWang01 could you please specify the exact version of SDK that you have used? is there any branch we can refer to which shows actual run times?

vlada-shubina avatar Sep 05 '22 13:09 vlada-shubina

Updated the specific version of SDK above. And linked the PRs running in CI as reference, the ones with the title containing "Preview 4" run with dotnet 7.0.100-preview.4.22178.9. Others run with dotnet 7.0.100-preview.5.22307.18.

GangWang01 avatar Sep 06 '22 06:09 GangWang01

Moved to https://github.com/dotnet/sdk/issues/27686, closing this one.

vlada-shubina avatar Sep 06 '22 08:09 vlada-shubina