runtime icon indicating copy to clipboard operation
runtime copied to clipboard

System.Data.OleDb.Test - timeout and hangs

Open karelz opened this issue 3 years ago • 13 comments
trafficstars

  Discovering: System.Data.OleDb.Tests (app domain = on [no shadow copy], method display = ClassAndMethod, method display options = None)
  Discovered:  System.Data.OleDb.Tests (found 50 of 85 test cases)
  Starting:    System.Data.OleDb.Tests (parallel test collections = on, max threads = 2)
['System.Data.OleDb.Tests' END OF WORK ITEM LOG: Command timed out, and was killed]

Roughly 1 failure per day, with 4 rolling runs affected in last 3 weeks. It is blocking CI, we should investigate it ASAP.

{ "ErrorPattern":".*'System.Data.OleDb.Tests' END OF WORK ITEM LOG: Command timed out, and was killed.*" } 

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0

Known Issue Error Message

Fill the error message using known issues guidance.

{
  "ErrorMessage": "",
  "BuildRetry": false
}

karelz avatar Aug 24 '22 09:08 karelz

Tagging subscribers to this area: @roji, @ajcvickers See info in area-owners.md if you want to be subscribed.

Issue Details

The test suite fails

Failures in last 120 days

Day Run Details
8/23 Rolling run 1960858 Helix timeout
8/22 Rolling run 1959521 Helix timeout
8/22 PR 1959387 Helix timeout
8/20 PR 1955186 Helix timeout
8/19 Rolling run 1954782 Helix timeout
8/18 PR 1952610 Helix timeout
8/18 PR 1952074 Long running tests - System.Data.OleDb.Tests.OleDbCommandBuilderTests.DeriveParameters_NullCommandText_Throws and System.Data.OleDb.Tests.OleDbCommandBuilderTests.DeriveParameters_InvalidCommandType_Throws
8/18 PR 1951129 Helix timeout
8/17 PR 1950076 Helix timeout
~8/16~ ~PR 1946316~ ~Tests passed - Console log~ - likely infra issue
8/16 Rolling run 1946264 Helix timeout
8/15 PR 1945616 Helix timeout
8/15 PR 1944502 Long running test - System.Data.OleDb.Tests.OleDbConnectionTests.ServerVersionTest
~8/12~ ~PR 1941350~ ~Assertion failed at System.String.Substring(Int32 startIndex, Int32 length) in /_/src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs:line 1836~ -- likely specific to the PR
~8/11~ ~2x PR 1938154~ ~Assert: ARRAY_SIZE(pCols) >= pTable->m_cCols at coreclr\md\runtime\metamodel.cpp Line: 736~ -- likely specific to the PR
8/11 PR 1937102 Helix timeout
8/10 PR 1935667 Long running test - System.Data.OleDb.Tests.OleDbCommandBuilderTests.DeriveParameters_InvalidCommandType_Throws
8/10 PR 1934364 Long running test - System.Data.OleDb.Tests.OleDbCommandBuilderTests.DeriveParameters_InvalidCommandType_Throws
8/8 PR 1930946 Helix timeout
8/8 PR 1930321 Long running test - System.Data.OleDb.Tests.OleDbCommandBuilderTests.DeriveParameters_InvalidCommandType_Throws
8/5 PR 1925814 Helix timeout
8/4 PR 1923196 Long running test - System.Data.OleDb.Tests.OleDbCommandBuilderTests.DeriveParameters_InvalidCommandType_Throws
8/3 PR 1920208 Long running test - System.Data.OleDb.Tests.OleDbCommandBuilderTests.DeriveParameters_InvalidCommandType_Throws
?-8/1 Console logs not available

Roughly 1 failure per day, with 4 rolling runs affected in last 3 weeks. It is blocking CI, we should investigate it ASAP.

Author: karelz
Assignees: -
Labels:

blocking-clean-ci, area-System.Data.OleDB

Milestone: -

ghost avatar Aug 24 '22 09:08 ghost

We hit this failure in a PR the 7.0 branch: https://github.com/dotnet/runtime/pull/74600

  • Job results: https://dev.azure.com/dnceng/public/_build/results?buildId=1967070&view=logs&j=55b384d0-d6b0-5894-fd8a-d754d713246e&t=8dcacd81-cb5e-5ac9-93e5-c3f19b19e19b&l=106
  • Failure log: https://helix.dot.net/api/2019-06-17/jobs/7437910e-a6f8-4979-9724-a91a7064099a/workitems/System.Data.OleDb.Tests/console
  • Log message:
Console log: 'System.Data.OleDb.Tests' from job 7437910e-a6f8-4979-9724-a91a7064099a workitem 910dc97e-cc9c-4968-8c77-e22a1a4d0046 (windows.81.amd64.open.svc) executed on machine a0003FJ running Windows-2012ServerR2-6.3.9600-SP0

C:\h\w\9ADB0871\w\B03B094F\e>taskkill.exe /f /im corerun.exe 
ERROR: The process "corerun.exe" not found.

C:\h\w\9ADB0871\w\B03B094F\e>call RunTests.cmd --runtime-path C:\h\w\9ADB0871\p 
----- start Thu 08/25/2022 19:42:51.37 ===============  To repro directly: ===================================================== 
pushd C:\h\w\9ADB0871\w\B03B094F\e\
"C:\h\w\9ADB0871\p\dotnet.exe" exec --runtimeconfig System.Data.OleDb.Tests.runtimeconfig.json --depsfile System.Data.OleDb.Tests.deps.json xunit.console.dll System.Data.OleDb.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================

C:\h\w\9ADB0871\w\B03B094F\e>"C:\h\w\9ADB0871\p\dotnet.exe" exec --runtimeconfig System.Data.OleDb.Tests.runtimeconfig.json --depsfile System.Data.OleDb.Tests.deps.json xunit.console.dll System.Data.OleDb.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Data.OleDb.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Data.OleDb.Tests (found 50 of 85 test cases)
  Starting:    System.Data.OleDb.Tests (parallel test collections = on, max threads = 2)
----- end Thu 08/25/2022 19:43:43.54 ----- exit code -1073740771 ----------------------------------------------------------
2022-08-25T19:43:44.159Z	INFO   	run.py	run(48)	main	Beginning reading of test results.
2022-08-25T19:43:44.159Z	INFO   	run.py	__init__(42)	read_results	Searching 'C:\h\w\9ADB0871\w\B03B094F\e' for test results files
2022-08-25T19:43:44.159Z	INFO   	run.py	__init__(42)	read_results	Searching 'C:\h\w\9ADB0871\w\B03B094F\uploads' for test results files
2022-08-25T19:43:44.159Z	WARNING	run.py	__init__(55)	read_results	No results file found in any of the following formats: xunit, junit, trx
2022-08-25T19:43:44.159Z	INFO   	run.py	packing_test_reporter(30)	report_results	Packing 0 test reports to 'C:\h\w\9ADB0871\w\B03B094F\e\__test_report.json'
2022-08-25T19:43:44.159Z	INFO   	run.py	packing_test_reporter(33)	report_results	Packed 1414 bytes
read file: C:\h\w\9ADB0871\p\debug-dump-template.md
writing output file: C:\h\w\9ADB0871\w\B03B094F\uploads\how-to-debug-dump.md
done writing debug dump information
['System.Data.OleDb.Tests' END OF WORK ITEM LOG: Command exited with -1073740771]

carlossanlop avatar Aug 25 '22 21:08 carlossanlop

@jkotas @roji Does anything need doing here?

ajcvickers avatar Sep 06 '22 14:09 ajcvickers

I do not see any hits in last week, so there is a good chance that this is fixed.

jkotas avatar Sep 06 '22 14:09 jkotas

I still see hits on some PRs, some of them created 1-2 days ago. Reopening.

karelz avatar Sep 12 '22 21:09 karelz

Stack trace of the crash (the hangs seems to be caused by crash reporting taking too long, so same root cause):

ntdll!ZwWaitForMultipleObjects+0xa [d:\blue.obj.amd64fre\minkernel\ntdll\daytona\objfre\amd64\usrstubs.asm @ 894] 
ntdll!RtlReportExceptionEx+0x452 [d:\blue\minkernel\ntdll\wer.c @ 982] 
ntdll!RtlReportException+0xbc [d:\blue\minkernel\ntdll\wer.c @ 1164] 
comsvcs!ReportException+0x52 [d:\blue\com\complus\src\shared\util\svcerr.cpp @ 1704] 
comsvcs!UserCodeExceptionFilter+0x33 [d:\blue\com\complus\src\shared\util\svcerr.cpp @ 1724] 
comsvcs!ComSvcsExceptionFilter+0x33 [d:\blue\com\complus\src\comsvcs\comsvcs\comsvcs.cpp @ 734] 
comsvcs!`CHolder::SafeDispenserDriver::DestroyResource'::`1'::filt$0+0x1d [d:\blue\com\complus\src\comsvcs\dispman\cholder.cpp @ 216] 
ntdll!__C_specific_handler+0x96 [d:\blue\minkernel\crts\crtw32\misc\amd64\chandler.c @ 194] 
ntdll!RtlpExecuteHandlerForException+0xd [d:\blue\minkernel\ntos\rtl\amd64\xcptmisc.asm @ 131] 
ntdll!RtlDispatchException+0x197 [d:\blue\minkernel\ntos\rtl\amd64\exdsptch.c @ 535] 
ntdll!KiUserExceptionDispatch+0x3a [d:\blue\minkernel\ntos\rtl\amd64\trampoln.asm @ 715] 
ACECORE!JetTerm+0x50 [n:\src\ace\jet4\src\initterm.c @ 1062] 
ACEOLEDB!TerminateJetInstance+0x23 [n:\src\ace\jolt\src\classfac.cpp @ 166] 
ACEOLEDB!MsoHelper::TerminateJet+0x6b [n:\src\ace\jolt\src\classfac.cpp @ 236] 
ACEOLEDB!CDataSource::UtilUninitialize+0xa4 [n:\src\ace\jolt\src\dso.cpp @ 2376] 
ACEOLEDB!CDataSource::~CDataSource+0x37 [n:\src\ace\jolt\src\dso.cpp @ 114] 
ACEOLEDB!CDataSource::`scalar deleting destructor'+0x14
ACEOLEDB!CDataSource::Release+0x80 [n:\src\ace\jolt\src\dso.cpp @ 723] 
oledb32!CAcm::FinalRelease+0x55 [d:\9147\enduser\databaseaccess\src\mdac\oledb\src\cacm.cpp @ 518] 
oledb32!CDCM::FinalRelease+0x19 [d:\9147\enduser\databaseaccess\src\mdac\oledb\src\dcm.cpp @ 93] 
oledb32!ATL::CComPolyObject<CDCM>::FinalRelease+0x1d [d:\9147.public.fre\sdk\inc\atl21\atlcom.h @ 963] 
oledb32!ATL::CComPolyObject<CDCM>::~CComPolyObject<CDCM>+0x41 [d:\9147.public.fre\sdk\inc\atl21\atlcom.h @ 969] 
oledb32!ATL::CComPolyObject<CDCM>::Release+0x37 [d:\9147.public.fre\sdk\inc\atl21\atlcom.h @ 980] 
oledb32!CDCMCreator::DestroyResource+0x2a [d:\9147\enduser\databaseaccess\src\mdac\oledb\src\dcmcreator.cpp @ 509] 
comsvcs!CHolder::SafeDispenserDriver::DestroyResource+0x3a [d:\blue\com\complus\src\comsvcs\dispman\cholder.cpp @ 201] 
comsvcs!CHolder::ProcessDestroyList+0x78 [d:\blue\com\complus\src\comsvcs\dispman\cholder.cpp @ 1089] 
comsvcs!CHolder::FreeResource+0x99 [d:\blue\com\complus\src\comsvcs\dispman\cholder.cpp @ 627] 
oledb32!CDCMCreator::ReleaseResource+0x60 [d:\9147\enduser\databaseaccess\src\mdac\oledb\src\dcmcreator.cpp @ 373] 
oledb32!CDPO::ReturnDCMToPool+0x4042b [d:\9147\enduser\databaseaccess\src\mdac\oledb\src\dpo.cpp @ 595] 
oledb32!CDPO::FinalRelease+0x13 [d:\9147\enduser\databaseaccess\src\mdac\oledb\src\dpo.cpp @ 384] 
oledb32!ATL::CComPolyObject<CDPO>::FinalRelease+0x13 [d:\9147.public.fre\sdk\inc\atl21\atlcom.h @ 963] 
oledb32!ATL::CComPolyObject<CDPO>::{dtor}+0x29 [d:\9147.public.fre\sdk\inc\atl21\atlcom.h @ 969] 
oledb32!ATL::CComPolyObject<CDPO>::Release+0x64 [d:\9147.public.fre\sdk\inc\atl21\atlcom.h @ 980] 
System_Private_CoreLib!System.Runtime.InteropServices.Marshal.Release(IntPtr) + 165
System.Private.CoreLib.dll!System.Runtime.InteropServices.Marshal.Release(IntPtr) + 165
System.Data.OleDb.dll!System.Data.ProviderBase.WrappedIUnknown.ReleaseHandle() + 90
System.Private.CoreLib.dll!System.Runtime.InteropServices.SafeHandle.InternalRelease(Boolean) + 143
System.Private.CoreLib.dll!System.Runtime.InteropServices.SafeHandle.Dispose(Boolean) + 32
System.Private.CoreLib.dll!System.Runtime.InteropServices.SafeHandle.Dispose() + 37
System.Data.OleDb.dll!System.Data.OleDb.OleDbConnectionInternal.Dispose() + 174
System.Data.OleDb.dll!System.Data.ProviderBase.DbConnectionInternal.CloseConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory) + 590
System.Data.OleDb.dll!System.Data.OleDb.OleDbConnection.Close() + 103
System.Data.OleDb.Tests.dll!System.Data.OleDb.Tests.OleDbTestBase.Dispose(Boolean) + 250
TestUtilities.dll!System.IO.FileCleanupTestBase.Dispose() + 53
xunit.execution.dotnet.dll!Xunit.Sdk.ExecutionTimer.Aggregate(System.Action) + 53 [/_/src/xunit.execution/Sdk/Frameworks/ExecutionTimer.cs @ 31]
xunit.execution.dotnet.dll!ReflectionAbstractionExtensions.DisposeTestClass(Xunit.Abstractions.ITest, System.Object, Xunit.Sdk.IMessageBus, Xunit.Sdk.ExecutionTimer, System.Threading.CancellationTokenSource) + 244 [/_/src/xunit.execution/Extensions/ReflectionAbstractionExtensions.cs @ 79]
xunit.execution.dotnet.dll!Xunit.Sdk.TestInvoker`1+c__DisplayClass47_0[[System.__Canon, System.Private.CoreLib]].b__1() + 149 [/_/src/xunit.execution/Sdk/Frameworks/Runners/TestInvoker.cs @ 216]
xunit.core.dll!Xunit.Sdk.ExceptionAggregator.Run(System.Action) + 51 [/_/src/xunit.core/Sdk/ExceptionAggregator.cs @ 73]

jkotas avatar Sep 12 '22 23:09 jkotas

Note #76054 which could be related: after performing a query via the OleDB ACE driver, the process no longer exits properly. I've verified that switching to the SQL Server OleDB driver makes the problem go away, so presumably this would be an issue in the ACE driver itself. This could also be the root cause for the problem happening above, so switching to a different OleDB driver in the tests could be the "fix" here (though ideally, the ACE driver would be fixed...).

/cc @ajcvickers

roji avatar Sep 24 '22 19:09 roji

@karelz we have a report that installing a recent update to the ACE driver may make this go away (KB5002243, see https://github.com/dotnet/runtime/issues/76054#issuecomment-1270714060). Is it possible to make sure this KB is installed on our CI images?

roji avatar Oct 07 '22 09:10 roji

@roji this is also happening in the release/6.0 branch quite frequently, but it seems the automated arcade error message detection doesn't work in that branch.

So if there is a fix available, please backport it to 6.0.

Here is an example:

  • PR: https://github.com/dotnet/runtime/pull/77260
  • Queue: Libraries Test Run release coreclr windows x64 Debug
  • Job: https://dev.azure.com/dnceng-public/public/_build/results?buildId=71547&view=logs&j=55b384d0-d6b0-5894-fd8a-d754d713246e&t=8dcacd81-cb5e-5ac9-93e5-c3f19b19e19b&l=949
  • Log file: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-77260-merge-32d91221ef4140938e/System.Data.OleDb.Tests/1/console.ab829b1e.log?helixlogtype=result
Console log: 'System.Data.OleDb.Tests' from job 32d91221-ef41-4093-8eb2-a0830865a9ca workitem 29b3eca1-6f22-4419-aeff-1e2cb18e2638 (windows.11.amd64.client.open.svc) executed on machine a000JWD running Windows-10-10.0.22000-SP0

C:\h\w\9DE008DB\w\B30409A1\e>taskkill.exe /f /im corerun.exe 
ERROR: The process "corerun.exe" not found.

C:\h\w\9DE008DB\w\B30409A1\e>call RunTests.cmd --runtime-path C:\h\w\9DE008DB\p 
----- start Thu 11/03/2022  2:45:57.45 ===============  To repro directly: ===================================================== 
pushd C:\h\w\9DE008DB\w\B30409A1\e\
"C:\h\w\9DE008DB\p\dotnet.exe" exec --runtimeconfig System.Data.OleDb.Tests.runtimeconfig.json --depsfile System.Data.OleDb.Tests.deps.json xunit.console.dll System.Data.OleDb.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================

C:\h\w\9DE008DB\w\B30409A1\e>"C:\h\w\9DE008DB\p\dotnet.exe" exec --runtimeconfig System.Data.OleDb.Tests.runtimeconfig.json --depsfile System.Data.OleDb.Tests.deps.json xunit.console.dll System.Data.OleDb.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Data.OleDb.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Data.OleDb.Tests (found 50 of 85 test cases)
  Starting:    System.Data.OleDb.Tests (parallel test collections = on, max threads = 2)
['System.Data.OleDb.Tests' END OF WORK ITEM LOG: Command timed out, and was killed]

carlossanlop avatar Nov 03 '22 18:11 carlossanlop

Here's another hit in a release/6.0 PR: https://github.com/dotnet/runtime/pull/77283

carlossanlop avatar Nov 03 '22 20:11 carlossanlop

@carlossanlop OK, thanks for flagging that. Is there anyone who can help with the suggestion in https://github.com/dotnet/runtime/issues/74488#issuecomment-1271366267? It's possible that some Windows updates on the CI machines would make this go away.

roji avatar Nov 05 '22 07:11 roji

The place to ask about this is https://teams.microsoft.com/l/team/19%3aa88bb61ffc1a4392ad38ebbc526c86f8%40thread.skype/conversations?groupId=4d73664c-9f2f-450d-82a5-c2f02756606d&tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47 Teams Channel.

jkotas avatar Nov 05 '22 12:11 jkotas

Since this is a known build error, I'm going to remove the blocking-clean-ci label.

steveisok avatar Feb 01 '23 17:02 steveisok