sdk
sdk copied to clipboard
Regression: `dotnet test` hangs for 15 minutes after test run completed on mac and linux
Our builds on VSTS include dotnet test
, which restores, builds, and tests our product. With dotnet CLI 2.1.100 all went well. But when we upgraded to 2.1.300, the dotnet tool doesn't exit for 15 minutes after the test run is completed with no output to explain the delay.
Expected behavior
The dotnet tool exits immediately, as can be seen on this 2.1.100 run:
Actual behavior
The dotnet tool exits after waiting 15 minutes, as can be seen on this 2.1.300 run:
Steps to reproduce
Please contact me via my microsoft.com email address for links to the builds if you'd like to investigate further.
I am not sure what you are doing in VSTS, but I suspect this might be long running servers that are still live when the test finishes.
In 2.1.300, we now have a long running VBCSCompiler and a MSBuild node re-use server and a Razor compilation server.
They all stay around for some time to avoid JITing and improve perf if you attempt a build again.
You can turn them off by running dotnet build-server shutdown
after the test execution. Or, you can set msbuild properties/env variables to prevent them from running to begin with.
I suspect this might be long running servers that are still live when the test finishes.
Perhaps. But why would that prevent the tool from exiting?
You can turn them off by running
dotnet build-server shutdown
after the test execution
How would I run that on VSTS after test execution, since test execution itself doesn't return control to the next step in my build till the 15 minutes are over?
Or, you can set msbuild properties/env variables to prevent them from running to begin with.
I'm willing to try that. Where do I find the variables to set? And if it works, is that a bug in dotnet CLI then that it's waiting for those servers when the work is finished anyway?
@peterhuene can you help @AArnott in getting the properties to turn off the long running servers?
@AArnott I don't understand enough about how VSTS does test execution to know why it would consider as if test was not done in that case.
Use -p:UseRazorBuildServer=false
to disable the Razor (rzc
) server.
Use -p:UseSharedCompilation=false
to disable the Roslyn (vbcscompiler
) server.
For MSBuild, pass /nodeReuse:false
on the command line to disable node re-use.
I don't know of anything that might be waiting on the servers. There was an issue where MSBuild might hang with node re-use enabled on non-Windows platforms (https://github.com/Microsoft/msbuild/issues/3161), but that was fixed for RTM.
Would it be possible to capture the stacks of the hang?
FWIW, I'm seeing hangs on Linux for dotnet build
unless I disable node reuse too. This happens on a machine where I'm running BenchmarkDotNet benchmarks, which does its own builds as well - so it's possible that something in that is jamming a build node. (My next step will be to find a way of disabling build reuse globally...)
I can confirm that setting MSBUILDDISABLENODEREUSE=1
on mac and linux removes the hang at the end of the build+test invocation as well.
FWIW, I vaguely recall a bug in a Windows process my team owned where a parent process spawned my child process, and so long as my child process was running the parent process never entirely exited. We fixed it (I don't remember how but could potentially find out). I wonder if something similar is happening here, where dotnet test
(which builds first) spawned the reusable node processes and children (instead of as top-level processes perhaps?) and now the parent process cannot exit till the reusable nodes timeout and exit. In that case, perhaps the fix is the change how the child processes are spawned so that they do not retain a child relationship with the parent.
@rainersigwald Some reports of hitting hangs with node reuse above.
This sounds exactly like what https://github.com/Microsoft/msbuild/commit/e604f5f42cb277d02a3b4bf4ce7136c1e10a5eee addressed. But evidently that was incomplete or doesn't work everywhere.
@jskeet @AArnott what's the best way for me to repro your cases? Can I just clone nodatime (before nodatime/nodatime#1157) and run build/runbenchmarks.sh
?
@rainersigwald: I believe that should repro it, yes. You'll need to specify the targets to run - netcoreapp2.0 netcoreapp2.1
is what I do normally. It's slightly harder to tell as I've normally run it as part of a cron job. I'd be happy to try it beforehand if that would help. Note that it's expected that it takes multiple hours, but there should be a lot of output a lot of the time.
When it's broken, you should be able to run dotnet build
anywhere (e.g. in src/NodaTime) and it'll hang.
@jskeet Oh, that's interesting. It repros with dotnet build
and not just dotnet test
?
I think I understand the dotnet test
problem: I think we're accidentally running the before-test build
within the context of the
https://github.com/dotnet/cli/blob/7ce47778b4a5238db133cd0d4ffc3aa2695e406c/src/dotnet/commands/dotnet-test/Program.cs#L99
environment, leading the build nodes to be created in a hangy way.
But that wouldn't apply to build
so I'll have to try debugging through the nodatime repro.
@rainersigwald: I don't run dotnet test
at all (in this workflow, or on that machine) - it repros just using dotnet run
with BenchmarkDotNet, but that does some more builds in the background. I don't know all the details of what it does, but I doubt that it calls dotnet test
.
But once it's in the "broken" state, then yes, dotnet build
hangs - as does dotnet restore
. (Specifying -nodeReuse:false
on the command line makes both work.)
I'm happy to try to get it into the broken state again and then poke around - are there diagnostics that would be useful to try at that point?
I just want to chime in and say that I'm experiencing this issue as well. I haven't been able to reproduce it reliably yet, but I'll keep trying and once I do, I'll post a link to a repo here.
The scenario I have is a unit test that is testing some MSBuild targets files. This involves calling dotnet build
during the unit test. This is basically what it does:
- Write a project file that uses the targets to disk.
- Compile the project using
dotnet build
. - Look at the output from the build and assert some stuff.
When running this via dotnet test
, it sometimes hangs for 15 minutes after the tests complete (the test duration reported is only a few seconds), but sometimes it doesn't hang at all.
@reduckted You should be able to resolve that by calling dotnet build -nodereuse:false
inside your test. What's happening for you is that the outer dotnet test
sets the environment variable that MSBuild uses to decide whether to attach the console or not. That is propagated down through child processes to the test process and then to the test's child dotnet build
process, which then starts with node reuse but without the safe disconnect-console node startup. Explicitly specifying "no node reuse" should cause the worker nodes to exit immediately after the build, allowing the wait-on-child-process code to complete quickly.
Thanks @rainersigwald. 👍 I had tried that after reading some of the earlier posts and the hangs appeared to stop, but since I couldn't reliably reproduce it to begin with, I wasn't 100% sure it was the solution.
FYI, this is happening for me on Ubuntu 18.04 and with dotnet 2.2.300. Passing -nodereuse:false
to dotnet test
solves this problem for me.
/usr/bin/dotnet test --test-adapter-path . --framework netcoreapp2.2 --logger xunit;LogFilePath=/home/.../output/testresults-linux.xml -nodereuse:false
I can confirm that this still happens on .NET Core 3.1
ubuntu@VM-0-4-ubuntu:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
root@195d912bb800:/# dotnet --version
3.1.100
Test run for ***********************(.NETCoreApp,Version=v3.1)
Microsoft (R) Test Execution Command Line Tool Version 16.3.0
Copyright (c) Microsoft Corporation. All rights reserved.
Starting test execution, please wait...
A total of 1 test files matched the specified pattern. <---- hangs forever
-nodereuse:false
doesn't work either.
I can confirm that this still happens on .NET Core 3.1
ubuntu@VM-0-4-ubuntu:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.1 LTS Release: 18.04 Codename: bionic
root@195d912bb800:/# dotnet --version 3.1.100
Test run for ***********************(.NETCoreApp,Version=v3.1) Microsoft (R) Test Execution Command Line Tool Version 16.3.0 Copyright (c) Microsoft Corporation. All rights reserved. Starting test execution, please wait... A total of 1 test files matched the specified pattern. <---- hangs forever
-nodereuse:false
doesn't work either.
This the exact spot where my test run also hangs.
One interesting point I noticed is that the same code without changes runs in Visual Studio for Mac (the very first test method runs ok, and the others take 2+ minutes each - pure unit tests with mocking - so not even hitting the db). Not sure if that can help solve the issue.
dotnet version: 3.1.201
I too was facing the same issue where tests hang when executed through Visual Studio or through command line. In my case, the issue turned out to be happening because of a Nuget package I was using to generate Fake objects - AutoBogus . The library has an open issue for the same problem here AutoBogus Issue. When I switched the package to use Bogus the library which AutoBogus uses, it fixed the issue. My tests are running smoothly now. Sharing this here, just in case if it helps any one.
I am still seeing this behavior on Azure DevOps with .NET Core 3.1.301 on a Windows Hosted Agent with VS2019
running dotnet test within a powershell script causes it to hang at the end of the execution. willing to share build and repro details.
Eg: Get's stuck here for 40+ minutes
CC: @rainersigwald @peterhuene @livarcocc
Same issue on Windows Server 2019 Standard
dotnet version of 5.0.100-preview.8.20417.9
Passing -nodereuse:false
to dotnet test does not solves this issue.
I think I might have found the culprit. dotnet test
expects to be run in a terminal window
When run in an emacs shell
[bence@x250:/tmp]$ dotnet new xunit --name XUnitTiming
The template "xUnit Test Project" was created successfully.
Processing post-creation actions...
Running 'dotnet restore' on XUnitTiming/XUnitTiming.csproj...
Restore completed in 463.81 ms for /tmp/XUnitTiming/XUnitTiming.csproj.
Restore succeeded.
[bence@x250:/tmp]$ echo $TERM
dumb
[bence@x250:/tmp]$ time dotnet test XUnitTiming/XUnitTiming.csproj
Test run for /tmp/XUnitTiming/bin/Debug/netcoreapp3.1/XUnitTiming.dll(.NETCoreApp,Version=v3.1)
Microsoft (R) Test Execution Command Line Tool Version 16.3.0
Copyright (c) Microsoft Corporation. All rights reserved.
Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
Test Run Successful.
Total tests: 1
Passed: 1
Total time: 1.2366 Seconds
real 0m23.992s
user 0m3.246s
sys 0m0.379s
[bence@x250:/tmp]$
When run in xterm:
[bence@x250:/tmp]$ echo $TERM
xterm
[bence@x250:/tmp]$ time dotnet test XUnitTiming/XUnitTiming.csproj
Test run for /tmp/XUnitTiming/bin/Debug/netcoreapp3.1/XUnitTiming.dll(.NETCoreApp,Version=v3.1)
Microsoft (R) Test Execution Command Line Tool Version 16.3.0
Copyright (c) Microsoft Corporation. All rights reserved.
Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
Test Run Successful.
Total tests: 1
Passed: 1
Total time: 1.2068 Seconds
real 0m2.699s
user 0m2.961s
sys 0m0.318s
[bence@x250:/tmp]$
That 20 seconds in the first case is spent after the line
A total of 1 test files matched the specified pattern.
Where I assume it is expecting to read something from the terminal
Hey folks, i'm facing this issue too, i have a project with BDD tests using specflow and mocking external api's using wiremock.net, when the pipeline run the line A total of 1 test files matched the specified pattern.
stay stucked for 16 minutes, and it's just 5 tests scenarios.
Hello Guys, I tried all options said here, but nothing work.
I solves my problem (happening only in azure devops integration) turning off parallel test executing
In my test project > Add folder Properties
> Add AssemblyInfo.cs
using Xunit;
[assembly: CollectionBehavior(DisableTestParallelization = true)]
I'm experiencing some test hanging within GitLab and docker image for the dotnet SDK 5.0 and this seems a quite common issue. See https://github.com/microsoft/vstest/issues/2080
One question for those of you experiencing this. Are you using Hosted services? I can reproduce the issue only when using IHostedService
, but if I disable that, all is good.
For example, I've got the following:
public class InfrastructureInitializer
: IHostedService
{
private readonly EventBusManager _eventBusManager;
public InfrastructureInitializer(EventBusManager eventBusManager)
{
_eventBusManager = eventBusManager;
}
public async Task StartAsync(CancellationToken cancellationToken)
{
var cancellationTokenSource = new CancellationTokenSource(TimeSpan.FromSeconds(10));
await _eventBusManager.Start(cancellationTokenSource.Token);
}
public async Task StopAsync(CancellationToken cancellationToken)
{
var cancellationTokenSource = new CancellationTokenSource(TimeSpan.FromSeconds(10));
await _eventBusManager.Stop(cancellationTokenSource.Token);
}
}
and registered in DI container
services.AddSingleton<IHostedService, InfrastructureInitializer>();
Notice there is a cancellation token that should be respected (and locally it does) if for whatever reason my event bus is unreachable. It seems that a deadlock is caused by this despite of properly awaiting code and having the cancellation token.
Is it because of tests running in parallel? Is that a problem when using IHostedService
? Why in Windows or WSL2 ubuntu I'm not having this issue then?
Verified the deadlock issue does not happen for the IHostedService
when disabling parallelization as per Thiago's suggestion.
In my functional tests project I create a Properties/AssemblyInfo.cs
to disable parallelization only in Release mode, because locally in Debug I don't find the issue.
using Xunit;
#if DEBUG
[assembly: CollectionBehavior(DisableTestParallelization = false)]
#else
[assembly: CollectionBehavior(DisableTestParallelization = true)]
#endif
I am seeing this issue on CircleCI with a Windows Server 2019 VM. CircleCI's no_output_timeout
is causing it to report failure when it actually succeeded.
SDK Version: 5.0.302
Passed! - Failed: 0, Passed: 1687, Skipped: 0, Total: 1687, Duration: 4 m 39 s - ProtoPromiseTests.dll (net5.0)
Too long with no output (exceeded 10m0s): context deadline exceeded
Passing -nodereuse:false
seems to help, though I've only ran 1 test since I added it, so I'm not sure how stable it is.
I can reproduce the issue only when using
IHostedService
, but if I disable that, all is good.
I just reported a similar bug, related to Task.Delay in IHostedService.StopAsync()
:
Deadlock in integration tests after upgrading to ASP.NET Core 6 https://github.com/dotnet/aspnetcore/issues/38649
I am using Xunit and was able to fix this issue by adding the following to csproj:
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.0.0" />