Updating to from .NET 8 SDK to .NET 9 SDK Preview 4 causes `dotnet test` to hang forever
Description
So far, I don't have an isolated repro. But it happens for Uno Platform Wasm UI tests that we execute via dotnet test. The tests are passing, but dotnet test isn't terminating.
Upon investigation, I found:
So it looks like somehow, the VSTestTask2 isn't terminating. It's stuck there forever. Setting MSBUILDENSURESTDOUTFORTASKPROCESSES environment variable to 1 does the trick for now.
Steps to reproduce
We haven't yet got a minimal repro.
Expected behavior
Actual behavior
Diagnostic logs
Environment
vstesttask2 task starts an exe, and waits for it to exit. It will sit there as long as the exe will be running. When you look in test explorer do you see vstest.console / dotnet running under this process? Do you see also testhost running under vstest.console process?
vstesttask2 task starts an exe
Is it the testhost.exe? That one terminates correctly
I have a dump of the dotnet process, in case that can help, I'll send it to you.
I've got your dump, it looks like the tool task is simply waiting for a child process to exit. The child process is vstest.console.
In the logs of vstest.console I can see it exited, but I also see that the Process ID is different from what is in the dump file, so this is probably from 2 different runs, not a big problem, but could you double check that vstest.console is stopped under the task? if you put long wait in your test, you should see one under some MSBuild node, and then it should exit.
You could also try using -nodereuse:false, that will disable using "cached" MSbuild nodes, and will start a new one for this run, if it still stays stuck it makes it much easier to see what process is stuck, because they all run under the terminal process.
@nohwnd Oh. The logs I sent to @Evangelink were different run than when I took the dump, I think. However, when I was seeing the task is waiting for a process, I couldn't find the process id at all in task manager, so it was strange it's waiting for a process that already exited somehow.
I'll try to delay the test and see if I can find more information.
That is indeed weird, and in that case it would be a MSBuild bug (not that I am trying to ditch responsibility, but we are fully relying on ToolTask to do this). Let me know what you found and I will talk with msbuild team if there is problem in tooltask.
Great. I'll double check my analysis and try to get more info and get back to you
We ran into this issue in https://github.com/dotnet/sdk/pull/41198 CI.
The CI test step would time out after 30 min on each attempt.
After adding MSBUILDENSURESTDOUTFORTASKPROCESSES=1 the test step finishes in less than 4 min.
cc @ViktorHofer @rainersigwald @Forgind
On a hunch, can someone try setting MSBUILDNODEWINDOW to 1 to see if that also resolves the problem?
Actually, I guess I can do that. I'll try to get that started later today.
This is now on list of work for msbuild team, and me to fix. We still don't know where it is happening though. So if you have any additional info, or repro it would be very welcome. Especially double checking if vstest.console is or is not running while the hang is observed. And diagnostic logs of dotnet test.
There is a thing with WaitForExit() method when parent process reads stdout asynchronously. If there is a grandchild process started by child process, WaitForExit() of the parent process waits for exit of the grandchild. It blocks even when the child process exits.
I'm not saying it's the root cause in this situation, but it's possible. Our ToolTask uses WaitForExit(), so I can try to avoid this situation on our side.
Not sure if that would be related to https://github.com/dotnet/runtime/issues/103384
What I described relates to issue you mentioned. @Youssef1313, could you please try to find if there is a process that was started by the testhost and terminate it? If it unblocks the MSBuild, then the problem is in our codebase and should be fixed. The workaround is to use different overload of WaitForExit method.
I may not be able to re-test this soon-ish, but IIRC, when WaitForExit was stuck, I wasn't able to find a matching process id that's open. So it felt like that process already terminated but WaitForExit was still blocking and didn't return.
Yes, if testhost started another process with redirected output, then our WaitForExit will wait for the grandchild process to exit. So far I don't have another idea what is happening.
I implemented workaround but I had to revert my changes, because it caused problem with exit code :( Still, it would be great to know if there is hanging grandchild process. In that case, testing team could make change here and don't exit before their child process terminates.
This indeed happens when the test itself creates child processes (for instance: MSSQL localdb, or webdrivers/browsers) and fails to terminate these child processes. dotnet test will then indeed hang.
Still happens with SDK 9.0.100.
Looks to be connected to capturing process output:
This change adds an option to fully disable capturing standard output. Setting it unblocked my run: https://github.com/microsoft/vstest/pull/4998
VSTEST_DISABLE_STANDARD_OUTPUT_CAPTURING=1 env variable
BUT we've been capturing the output like this for a long time, we were just redirecting it to null. So it should be a workaround, and not a cause of the issue.
https://github.com/nohwnd/Cuemon/blob/3126ca48266b9ad493524a06ea9b77c79e907a30/.github/workflows/pipelines.yml#L157-L158
Actually no matter what I do I cannot repro the hang when running on 9.0.100, so I am not sure if the option above did anything.
I can confirm this is now working for Uno: https://github.com/unoplatform/uno/pull/18841
OP confirmed this is working for them now.
@Sebazzz do you have a repro of your problem please? If it still does repro, please start a new issue.