vstest icon indicating copy to clipboard operation
vstest copied to clipboard

Hard to distinguish between a test host crash because of a hang or an actual crash

Open ViktorHofer opened this issue 5 years ago • 6 comments

Example log:

C:\helix\work\workitem>C:\helix\work\correlation\dotnet.exe C:\helix\work\correlation\sdk\6.0.0\vstest.console.dll System.Net.Security.Tests.dll --Settings:.runsettings --Platform:x64 --Blame:"CollectDump;CollectAlways=true;DumpType=Full;CollectHangDump;HangDumpType=Full;TestTimeout=900s" -- RunConfiguration.DotNetHostPath="C:\helix\work\correlation\dotnet.exe" 
Microsoft (R) Test Execution Command Line Tool Version 16.8.0
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
Data collector 'Blame' message: VSTEST_DUMP_PATH is specified. Dump files will be saved in: C:\helix\work\workitem\uploads, and won't be added to attachments..
Blame: Attaching crash dump utility to process dotnet (8144).
[xUnit.net 00:00:07.96]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_OnlyTls13CipherSuiteAllowedButChosenProtocolsDoesNotAllowIt_Fails [SKIP]
[xUnit.net 00:00:07.97]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_OnlyNonTls13CipherSuiteAllowedButOtherSideDoesNotAllowIt_Fails [SKIP]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_OnlyTls13CipherSuiteAllowedButChosenProtocolsDoesNotAllowIt_Fails [1 ms]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_OnlyNonTls13CipherSuiteAllowedButOtherSideDoesNotAllowIt_Fails [1 ms]
[xUnit.net 00:00:07.97]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowSameTwoOnBothSidesLessPreferredIsTls13_Success [SKIP]
[xUnit.net 00:00:07.97]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.NegotiatedCipherSuite_SslProtocolIsTls13_ShouldBeTls13 [SKIP]
[xUnit.net 00:00:07.98]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowSomeCipherSuitesWithNoEncryptionOption_Fails [SKIP]
[xUnit.net 00:00:07.98]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_OnlyNonTls13CipherSuitesAllowedButChosenProtocolDoesNotAllowIt_Fails [SKIP]
[xUnit.net 00:00:07.98]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_ThreeCipherSuitesWithTwoOverlapping_Success [SKIP]
[xUnit.net 00:00:07.99]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowedCipherSuitesIncludesSubsetOfInput_Success [SKIP]
[xUnit.net 00:00:07.99]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowOneOnOneSideTls13_Success [SKIP]
[xUnit.net 00:00:07.99]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowTwoOnBothSidesWithSingleOverlapNonTls13_Success [SKIP]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowSameTwoOnBothSidesLessPreferredIsTls13_Success [1 ms]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.NegotiatedCipherSuite_SslProtocolIsTls13_ShouldBeTls13 [1 ms]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowSomeCipherSuitesWithNoEncryptionOption_Fails [1 ms]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_OnlyNonTls13CipherSuitesAllowedButChosenProtocolDoesNotAllowIt_Fails [1 ms]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_ThreeCipherSuitesWithTwoOverlapping_Success [1 ms]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowedCipherSuitesIncludesSubsetOfInput_Success [1 ms]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowOneOnOneSideTls13_Success [1 ms]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowTwoOnBothSidesWithSingleOverlapNonTls13_Success [1 ms]
[xUnit.net 00:00:08.01]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_TwoCipherSuitesWithAllOverlapping_Success [SKIP]
[xUnit.net 00:00:08.01]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.Tls13IsSupported_GetValue_ReturnsTrue [SKIP]
[xUnit.net 00:00:08.02]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowTwoOnBothSidesWithNoOverlapNonTls13_Fails [SKIP]
[xUnit.net 00:00:08.02]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_OnlyTls13CipherSuiteAllowedOtherSideDoesNotAllowTls13_Fails [SKIP]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_TwoCipherSuitesWithAllOverlapping_Success [1 ms]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.Tls13IsSupported_GetValue_ReturnsTrue [1 ms]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_AllowTwoOnBothSidesWithNoOverlapNonTls13_Fails [1 ms]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_OnlyTls13CipherSuiteAllowedOtherSideDoesNotAllowTls13_Fails [1 ms]
[xUnit.net 00:00:08.03]     System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_NothingAllowed_Fails [SKIP]
  Skipped System.Net.Security.Tests.NegotiatedCipherSuiteTest.CipherSuitesPolicy_NothingAllowed_Fails [1 ms]
[xUnit.net 00:00:08.78]     System.Net.Security.Tests.SslStreamEKUTest.SslStream_SelfSignedClientEKUClientAuth_Ok [SKIP]
  Skipped System.Net.Security.Tests.SslStreamEKUTest.SslStream_SelfSignedClientEKUClientAuth_Ok [1 ms]
[xUnit.net 00:00:08.78]     System.Net.Security.Tests.SslStreamEKUTest.SslStream_ServerEKUClientAuth_Fails [SKIP]
[xUnit.net 00:00:08.78]     System.Net.Security.Tests.SslStreamEKUTest.SslStream_ClientEKUServerAuth_Fails [SKIP]
[xUnit.net 00:00:08.78]     System.Net.Security.Tests.SslStreamEKUTest.SslStream_NoEKUServerAuth_Ok [SKIP]
[xUnit.net 00:00:08.79]     System.Net.Security.Tests.SslStreamEKUTest.SslStream_NoEKUClientAuth_Ok [SKIP]
  Skipped System.Net.Security.Tests.SslStreamEKUTest.SslStream_ServerEKUClientAuth_Fails [1 ms]
  Skipped System.Net.Security.Tests.SslStreamEKUTest.SslStream_ClientEKUServerAuth_Fails [1 ms]
  Skipped System.Net.Security.Tests.SslStreamEKUTest.SslStream_NoEKUServerAuth_Ok [1 ms]
  Skipped System.Net.Security.Tests.SslStreamEKUTest.SslStream_NoEKUClientAuth_Ok [1 ms]
The active test run was aborted. Reason: Test host process crashed
Data collector 'Blame' message: VSTEST_DUMP_PATH is specified. Dump files will be saved in: C:\helix\work\workitem\uploads, and won't be added to attachments..
Data collector 'Blame' message: The specified inactivity time of 15 minutes has elapsed. Collecting hang dumps from testhost and its child processes.
Data collector 'Blame' message: VSTEST_DUMP_PATH is specified. Dump files will be saved in: C:\helix\work\workitem\uploads, and won't be added to attachments..
Data collector 'Blame' message: Dumping 8144 - dotnet.
Results File: C:\helix\work\workitem\TestResults\ContainerAdministrator_03766C25FF57_2020-10-12_16_26_49.trx
Html test results file : C:\helix\work\workitem\TestResults\TestResult_ContainerAdministrator_03766C25FF57_20201012_164227.html

Attachments:
  C:\helix\work\workitem\TestResults\5eaac373-c39f-4541-afe3-2459ab467952\Sequence_a8afda8e091d49939835686d0fa1543a.xml
Passed!  - Failed:     0, Passed:  4542, Skipped:    20, Total:  4562, Duration: 9 m 33 s - System.Net.Security.Tests.dll (netcoreapp6.0)
Test Run Aborted.

The active Test Run was aborted because the host process exited unexpectedly. Please inspect the call stack above, if available, to get more information about where the exception originated from.
The test running when the crash occurred: 
System.Net.Security.Tests.SslStreamAlertsTest.SslStream_StreamToStream_HandshakeAlert_Ok

This test may, or may not be the source of the crash.

The log says that the run was aborted because the test host process crashed but after that it says that the inactivity time has elapsed. I'm unsure if this means that the test process crashed because of a test crash or because a test hang.

AB#1583163

ViktorHofer avatar Oct 12 '20 18:10 ViktorHofer

I tried to tell to vstest.console that blame datacollector is killing the testhost, but I could not find a good way of wiring this together. This probably needs more love.

nohwnd avatar Oct 13 '20 08:10 nohwnd

In NuGet.Client team we use dotnet test --blame-hang and xUnit longRunningTestSeconds setting to enable long-running (hung) test detection. If any tests is running longer than time specified, xUnit will display the test as [Long Running Test] in the log followed by dotnet blame log.

[Long Running Test] 'NuGet.Commands.Test.RuntimeTargetsTests.RestoreTargets_RestoreWithRuntimesAsync', Elapsed: 00:14:03
[Long Running Test]' NuGet.Commands.Test.NETCoreProject2ProjectTests.NETCoreProject2Project_VerifyCompileForTransitiveSettings(expected: "BDXY", ab: True, bc: False, cd: True, ax: True, xy: True, yd: True)', Elapsed: 00:14:02
  Blame: Creating hang dump of process dotnet (5225).
  The active test run was aborted. Reason: Test host process crashed
  Data collector 'Blame' message: The specified inactivity time of 15 minute/s has elapsed. Collecting a dump and killing the test host process..

kartheekp-ms avatar Oct 28 '20 17:10 kartheekp-ms

@kartheekp-ms How does that relate to this issue?

ViktorHofer avatar Oct 28 '20 18:10 ViktorHofer

@ViktorHofer - The title of the issue says Hard to distinguish between a test host crash because of a hang or an actual crash. I agree with that. I thought using xUnit longRunningTestSeconds setting will help because it adds additional information [Long Running Test] to the test name before hang/crash dump was collected.

kartheekp-ms avatar Oct 28 '20 19:10 kartheekp-ms

I think it would be best if VSTest itself logs all the information that is useful to identify a hanging test and not rely on an adapter's feature for that.

ViktorHofer avatar Nov 03 '20 14:11 ViktorHofer

This needs additional synchronization with datacollector. What happens is that as soon as we kill testhost the on exit event is triggered, but vstest.console does not know that there is something killing the testhost on purpose, so it just reports a crash, and then blame data collector sends a message saying it killed testhost.

nohwnd avatar Jul 27 '22 13:07 nohwnd