vstest icon indicating copy to clipboard operation
vstest copied to clipboard

Test Host randomly crashing with message "##[error]The active test run was aborted. Reason: Test host process crashed"

Open ibiglari opened this issue 1 year ago • 28 comments

We have an MFC application and are using Microsoft::VisualStudio::CppUnitTestFramework for testing. VSTest@2 step is failing almost 50% of the time with the message ##[error]The active test run was aborted. Reason: Test host process crashed after the last step of the test routine is finished, as apparent from the log messages printed by Microsoft::VisualStudio::CppUnitTestFramework::Logger::WriteMessage(). The test successfully finishes when being executed directly via Visual Studio 17.8.4.

Pipeline Step:

  - task: VSTest@2
    displayName: Integration Tests
    inputs:
      testSelector: 'testAssemblies'
      testAssemblyVer2: 'Release\IntegrationTests.dll'
      searchFolder: '$(System.DefaultWorkingDirectory)'
      resultsFolder: '$(System.DefaultWorkingDirectory)/CoverageResults'
      runTestsInIsolation: true
      codeCoverageEnabled: true
      otherConsoleOptions: '/collect:"Code Coverage;Format=Xml" /platform:x86 /Diag:vstestlog.diag'
      testRunTitle: 'Run tests'
      rerunFailedTests: false

Steps to reproduce

Since this is a random occurrence, re-queuing the job usually resolves the issue.

Expected behavior

No crash during test execution

Actual behavior

Random crashes after last step of the test routine

Diagnostic logs

vstestlog.datacollector.24-01-11_08-49-42_80953_4.diag.log vstestlog.diag.log vstestlog.host.24-01-11_08-49-51_57026_4.diag.log

Environment

Agent name: 'Azure Pipelines 2'
Agent machine name: 'xxx'
Current agent version: '3.232.1'
Operating System
Microsoft Windows Server 2022
10.0.20348
Datacenter
Runner Image
Image: windows-2022
Version: 20240108.1.0
Included Software: https://github.com/actions/runner-images/blob/win22/20240108.1/images/windows/Windows2022-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/win22%2F20240108.1
Runner Image Provisioner
2.0.321.1
Current image version: '20240108.1.0'
Agent running as: 'VssAdministrator'

ibiglari avatar Jan 11 '24 09:01 ibiglari

@jakubch1 this looks like a CodeCoverage issue. Testhost just crashes and there is no error reported. Please have a look.

nohwnd avatar Jan 16 '24 11:01 nohwnd

@ibiglari you are using dynamic native instrumentation which is not fully stable. Please check: https://learn.microsoft.com/en-us/visualstudio/test/customizing-code-coverage-analysis?view=vs-2022#static-and-dynamic-native-instrumentation and enable for your projects: https://learn.microsoft.com/en-us/cpp/build/reference/profile-performance-tools-profiler?view=msvc-170 to use static native instrumentation.

After doing above you should not see in logs such errors:

TpTrace Information: 0 : 6604, 1, 2024/01/11, 08:49:47.363, 12024538561, datacollector.exe, StaticInstrumenter: searching in directory: D:\a\1\s\Release, recursive: True
TpTrace Information: 0 : 6604, 1, 2024/01/11, 08:49:50.835, 12059210729, datacollector.exe, StaticInstrumenter: native instrumentation result for D:\a\1\s\Release\IntegrationTests.dll: ErrorNoFixupData. Time taken: 00:00:03.4388736
TpTrace Information: 0 : 6604, 6, 2024/01/11, 08:49:50.929, 12060148132, datacollector.exe, StaticInstrumenter: native instrumentation result for D:\a\1\s\Release\SOFTWARE-TO-BE-TESTED.exe: ErrorNoFixupData. Time taken: 00:00:03.5328777

jakubch1 avatar Jan 16 '24 11:01 jakubch1

@jakubch1 so if I add /PROFILE to linker options, code coverage will automatically use static instrumentation instead of dynamic instrumentation? If not, how do I instruct test host to use static instrumentation in the pipeline?

ibiglari avatar Jan 19 '24 21:01 ibiglari

@jakubch1 I'm afraid adding /PROFILE did not fix the issue. Logs attached vstestlog.datacollector.24-01-20_23-20-38_09648_4.log vstestlog.host.24-01-20_23-20-51_39345_4.log vstestlog.log

Linker command line as extracted from build pipeline logs: Link: C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.37.32822\bin\HostX86\x86\link.exe /ERRORREPORT:QUEUE /OUT:"D:\a\1\s\Release\SOFTWARE-TO-BE-TESTED.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:D:\a\_temp\LIBS\OpenSSL\HQ\lib /LIBPATH:CsdBase\Win32\Release /LIBPATH:CsdSystem\Win32\Release /LIBPATH:DbLib\Win32\Release /LIBPATH:Interfaces\Win32\Release /LIBPATH:Utils\Win32\Release odbc32.lib psapi.lib libcrypto.lib libssl.lib Crypt32.lib version.lib dbghelp.lib csdbase.lib csdsystem.lib dblib.lib interfaces.lib utils.lib /MANIFEST /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /manifest:embed /manifestinput:"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.37.32822\Include\Manifest\dpiaware.manifest" /DEBUG /PDB:".\Release/SOFTWARE-TO-BE-TESTED.pdb" /MAP:"SOFTWARE-TO-BE-TESTED-Release.map" /SUBSYSTEM:WINDOWS /TLBID:1 /DYNAMICBASE /NXCOMPAT /IMPLIB:"D:\a\1\s\Release\SOFTWARE-TO-BE-TESTED.lib" /MACHINE:X86 /PROFILE /SAFESEH Release\Eftsrv.res

ibiglari avatar Jan 20 '24 23:01 ibiglari

@jakubch1 so if I add /PROFILE to linker options, code coverage will automatically use static instrumentation instead of dynamic instrumentation? If not, how do I instruct test host to use static instrumentation in the pipeline?

Yes. Static instrumentation is better and more reliable so we have it enabled by default but if /PROFILE is missing static instrumentation is not working and we fallback to use dynamic instrumentation.

Please try to configure below flags in runsettings:

<EnableStaticNativeInstrumentation>True</EnableStaticNativeInstrumentation>
<EnableDynamicNativeInstrumentation>False</EnableDynamicNativeInstrumentation>

Full runsettings samples: https://github.com/microsoft/codecoverage/blob/main/docs/configuration.md

jakubch1 avatar Jan 22 '24 07:01 jakubch1

@jakubch1 Even after updating .runsettings I am seeing the same behaviour.

Please try to configure below flags in runsettings:

<EnableStaticNativeInstrumentation>True</EnableStaticNativeInstrumentation>
<EnableDynamicNativeInstrumentation>False</EnableDynamicNativeInstrumentation>

Full runsettings samples: https://github.com/microsoft/codecoverage/blob/main/docs/configuration.md

ibiglari avatar Jan 22 '24 10:01 ibiglari

@ibiglari could you please share logs?

jakubch1 avatar Jan 22 '24 13:01 jakubch1

@ibiglari something is wrong as I don't see

<EnableStaticNativeInstrumentation>True</EnableStaticNativeInstrumentation>
<EnableDynamicNativeInstrumentation>False</EnableDynamicNativeInstrumentation>

inside logs. Can you double check that this was added into <CodeCoverage> tag?

Did you configure:

runSettingsFile: # string. Settings file. 

inside task: VSTest@2?

jakubch1 avatar Jan 22 '24 21:01 jakubch1

@jakubch1

Added runSettingsFile to VSTest@2 task, double checked the pipeline execution log.

.
.
.
##########################################################################
DtaExecutionHost version 19.229.34210.1.
Starting TestExecution Model...
Result Attachments will be stored in LogStore
			<DataCollector friendlyName="Code Coverage" uri="datacollector://Microsoft/CodeCoverage/2.0" assemblyQualifiedName="Microsoft.VisualStudio.Coverage.DynamicCoverageDataCollector, Microsoft.VisualStudio.TraceCollector, Version=11.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a">
				<Configuration>
					<CodeCoverage>
						<UseVerifiableInstrumentation>False</UseVerifiableInstrumentation>
						<EnableStaticNativeInstrumentation>True</EnableStaticNativeInstrumentation>
						<EnableDynamicNativeInstrumentation>False</EnableDynamicNativeInstrumentation>
					</CodeCoverage>
				</Configuration>
			</DataCollector>
		</DataCollectors>
	</DataCollectionRunSettings>	
	<Catch2Adapter>
		<FilenameFilter>^*Test</FilenameFilter>
	</Catch2Adapter>
</RunSettings>
Updated Run Settings:
<RunSettings>
  <DataCollectionRunSettings>
    <DataCollectors>
      <DataCollector friendlyName="Code Coverage" uri="datacollector://Microsoft/CodeCoverage/2.0" assemblyQualifiedName="Microsoft.VisualStudio.Coverage.DynamicCoverageDataCollector, Microsoft.VisualStudio.TraceCollector, Version=11.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a">
        <Configuration>
          <CodeCoverage>
            <UseVerifiableInstrumentation>False</UseVerifiableInstrumentation>
            <EnableStaticNativeInstrumentation>True</EnableStaticNativeInstrumentation>
            <EnableDynamicNativeInstrumentation>False</EnableDynamicNativeInstrumentation>
          </CodeCoverage>
        </Configuration>
      </DataCollector>
    </DataCollectors>
  </DataCollectionRunSettings>
  <Catch2Adapter>
    <FilenameFilter>^*Test</FilenameFilter>
  </Catch2Adapter>
  <RunConfiguration>
    <BatchSize>1000</BatchSize>
    <ResultsDirectory>D:\a\1\s\CoverageResults</ResultsDirectory>
  </RunConfiguration>
</RunSettings>
**************** Starting test execution *********************
.
.
.

New logs attached.

vstestlog.log vstestlog.host.24-01-22_21-22-54_31244_4.log

vstestlog.datacollector.24-01-22_21-22-41_03781_4.log

ibiglari avatar Jan 22 '24 21:01 ibiglari

@ibiglari currently you are instrumenting only 1 file: D:\a\1\s\IntegrationTests\Utils\Reflector\ReflectrService.dll. Is it required? Please try to exclude it to see what happen:

<ModulePaths><Exclude><ModulePath>.*ReflectrService\.dll$</ModulePath></Exclude></ModulePaths>

(inside <CodeCoverage>)

jakubch1 avatar Jan 23 '24 07:01 jakubch1

@jakubch1 No joy. Still seeing the crash. Logs attached.

Also, not sure what you mean by instrumenting only 1 file? Our test project is IntegrationTests,dll and pipeline console reflects this:

**************** Starting test execution *********************
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\Common7\IDE\Extensions\TestPlatform\vstest.console.exe "@D:\a\_temp\ds03vzkkih5.tmp"
Microsoft (R) Test Execution Command Line Tool Version 17.8.0 (x64)
Copyright (c) Microsoft Corporation.  All rights reserved.
vstest.console.exe "D:\a\1\s\Release\IntegrationTests.dll"
/Settings:"D:\a\_temp\wv4dzzinw1c.tmp.runsettings"
/EnableCodeCoverage
/InIsolation
/Logger:"trx"
/TestAdapterPath:"D:\a\1\s"
/collect:"Code Coverage;Format=Xml" /platform:x86 /Diag:vstestlog.diag
Starting test execution, please wait...
Logging Vstest Diagnostics in file: D:\a\1\s\vstestlog.diag
A total of 1 test files matched the specified pattern.
4.1736
Data collection : Logging DataCollector Diagnostics in file: D:\a\1\s\vstestlog.datacollector.24-01-23_21-27-55_24149_4.diag

vstestlog.log vstestlog.datacollector.24-01-23_21-27-55_24149_4.log vstestlog.host.24-01-23_21-28-13_32590_4.log

ibiglari avatar Jan 23 '24 21:01 ibiglari

@ibiglari could you please try to run it without code coverage enabled? We should check if this is even coverage related. Based on log currently your build doesn't make any instrumentation.

jakubch1 avatar Jan 24 '24 07:01 jakubch1

@jakubch1 I find that hard to believe, as I can see code coverage in SonarCloud. However, I will try and get back to you.

ibiglari avatar Jan 24 '24 23:01 ibiglari

@jakubch1 Turning off code coverage leads to successful execution of the test.

Logs attached vstestlog.log vstestlog.datacollector.24-01-25_02-45-39_84177_4.log vstestlog.host.24-01-25_02-45-55_30456_4.log

ibiglari avatar Jan 25 '24 12:01 ibiglari

@ibiglari thank you for info. Could you please experiment with below 4 flags:

<EnableStaticNativeInstrumentation>False</EnableStaticNativeInstrumentation>
<EnableDynamicNativeInstrumentation>False</EnableDynamicNativeInstrumentation>
<EnableStaticManagedInstrumentation>False</EnableStaticManagedInstrumentation>
<EnableDynamicManagedInstrumentation>False</EnableDynamicManagedInstrumentation>

I expect if all flags are False you don't have issue. Then you could 1 by 1 enable and check. If you can share your binaries I can also make this testing on my side. You can share privately binaries here: https://developercommunity.visualstudio.com/home

jakubch1 avatar Jan 25 '24 13:01 jakubch1

@jakubch1 I'm afraid I had a crash with all four set to false. vstestlog.datacollector.24-01-26_10-07-10_50220_4.log vstestlog.host.24-01-26_10-07-13_39508_4.log vstestlog.log

ibiglari avatar Jan 26 '24 10:01 ibiglari

@ibiglari any idea what is "Waiting for HQ Thread to Terminate..." ?

jakubch1 avatar Jan 26 '24 11:01 jakubch1

@jakubch1 Yes we set a mutex and wait for a thread to terminate... That's one of the log messages from the test routine. Why?

ibiglari avatar Jan 26 '24 11:01 ibiglari

I saw this in Test Platform logs. Could you please try this: https://github.com/microsoft/vstest/blob/main/docs/troubleshooting.md#collect-logs-and-crash-dumphang

jakubch1 avatar Jan 26 '24 11:01 jakubch1

@jakubch1 Now this is funny. If I add /Blame:"CollectDump;DumpType=Full;CollectHangDump;TestTimeout=30min;HangDumpType=Full" to the command line parameters for VSTest@2 task it doesn't crash. I can only guess this parameter changes the internal state of test host somehow?

The problem is, since this error happens randomly, it might be pure luck. I have to run the queue multiple times to ascertain whether a change in parameters causes a crash or not

ibiglari avatar Jan 26 '24 22:01 ibiglari

@jakubch1 Disregard previous comment. Test host crashed again, and this time this message was added to execution log:

Data collector 'Blame' message: procdump.exe could not be found, please make sure that the executable is available on PATH. Alternatively set PROCDUMP_PATH environment variable to a directory that contains procdump.exe executable.

How do I make this available to my DevOps build agent? Do I need to manually include the binary?

ibiglari avatar Jan 29 '24 06:01 ibiglari

@ibiglari please install proc dump (https://github.com/microsoft/vstest/blob/main/docs/troubleshooting.md#collect-process-dump-using-procdump-on-windows-ie-outofmemory) and set PROCDUMP_PATH when running your tests.

jakubch1 avatar Jan 29 '24 07:01 jakubch1

@jakubch1 Found an easier way. However, this meant the test ran for 40 minutes instead of 9 minutes, and then the pipeline timed out. Is that to be expected when using proc dump?

ibiglari avatar Jan 29 '24 09:01 ibiglari

@jakubch1 After adjusting job's timeout, I can confirm test host does not crash when proc dump is attached to it. However, once the test method is finished, there is a delay of about 30 minutes before test host returns and the pipeline can continue. I can only surmise that proc dump somehow changes internal state of test host. Any advise?

ibiglari avatar Feb 01 '24 04:02 ibiglari

@ibiglari in that case please remove /Blame:"CollectDump;DumpType=Full;CollectHangDump;TestTimeout=30min;HangDumpType=Full" stuff and before VSTest@2 step add step:

    - task: PowerShell@2
      displayName: 'Download and enable ProcDump'
      inputs:
        targetType: inline
        script: |
          Add-MpPreference -ExclusionPath $(Build.SourcesDirectory) -Force
          Set-MpPreference -DisableRealtimeMonitoring $true -Force
          Invoke-WebRequest -Uri "https://download.sysinternals.com/files/Procdump.zip" -OutFile $(Agent.TempDirectory)\Procdump.zip
          Expand-Archive -LiteralPath $(Agent.TempDirectory)\Procdump.zip -DestinationPath $(Agent.ToolsDirectory)\Procdump -Force
          mkdir -p "$(Build.SourcesDirectory)\artifacts\dumps"
          $(Agent.ToolsDirectory)\Procdump\procdump.exe -ma -i $(Build.SourcesDirectory)\artifacts\dumps -accepteula

and after VSTest@2 and step:

    - task: PowerShell@2
      displayName: 'Disable ProcDump'
      inputs:
        targetType: inline
        script: |
          $(Agent.ToolsDirectory)\Procdump\procdump.exe -u -accepteula
          Set-MpPreference -DisableRealtimeMonitoring $false -Force
      condition: always()

then please reproduce crash issue and provide us dumps. Please try to repro with:

<EnableStaticNativeInstrumentation>False</EnableStaticNativeInstrumentation>
<EnableDynamicNativeInstrumentation>False</EnableDynamicNativeInstrumentation>
<EnableStaticManagedInstrumentation>False</EnableStaticManagedInstrumentation>
<EnableDynamicManagedInstrumentation>False</EnableDynamicManagedInstrumentation>

jakubch1 avatar Feb 01 '24 07:02 jakubch1

@jakubch1 Will advise. I was installing procdump using a PowerShell task:

  - powershell: |
      choco install procdump -y
    displayName: Install Procdump
    condition: and(succeeded(), eq(variables['Agent.OS'], 'Windows_NT'))

I guess I need to remove it, right?

ibiglari avatar Feb 01 '24 21:02 ibiglari

@jakubch1 Just an update,

I have not been able to replicate the crash for the past week. The random nature of this crash makes it rather hard to investigate, but as soon as it happens, I will provide the logs here.

ibiglari avatar Feb 08 '24 22:02 ibiglari

Likely related to

#2952

which is intermittent for me to (although it fails most of the time).

Super frustrating that the vstest team does not seem to think the lack of diagnosabiliy here is a priority. I've wasted hours and hours on this.

tig avatar Jul 08 '24 00:07 tig

@tig what tools do you lack to diagnose the problem? In the linked issue you are getting full call stack on screen and you have a memory dump.

nohwnd avatar Jul 08 '24 06:07 nohwnd