runtime
runtime copied to clipboard
nativeaot/SmokeTests/Exceptions failing with `Assertion failed: (n_heaps <= heap_number) || !gc_t_join.joined()`
Assertion failed: (n_heaps <= heap_number) || !gc_t_join.joined(), file D:\a\_work\1\s\src\coreclr\gc\gc.cpp, line 6988
Return code: 1
Raw output file: C:\h\w\B51009A0\w\B29A098A\uploads\Reports\nativeaot.SmokeTests\Exceptions\Exceptions\Exceptions.output.txt
Raw output:
BEGIN EXECUTION
call C:\h\w\B51009A0\p\nativeaottest.cmd C:\h\w\B51009A0\w\B29A098A\e\nativeaot\SmokeTests\Exceptions\Exceptions\ Exceptions.dll
Exception caught!
Null reference exception in write barrier caught!
Null reference exception caught!
Test Stacktrace with exception on stack:
at BringUpTest.FilterWithStackTrace(Exception) + 0x28
at BringUpTest.Main() + 0x31c
at System.Runtime.EH.FindFirstPassHandler(Object, UInt32, StackFrameIterator&, UInt32&, Byte*&) + 0x188
at System.Runtime.EH.DispatchEx(StackFrameIterator&, EH.ExInfo&) + 0x161
at System.Runtime.EH.RhThrowEx(Object, EH.ExInfo&) + 0x4b
at BringUpTest.Main() + 0xaf
Exception caught via filter!
Expected: 100
Actual: 3
END EXECUTION - FAILED
Build Information
Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=715849 Build error leg or test failing: nativeaot\SmokeTests\Exceptions\Exceptions\Exceptions.cmd Pull request: https://github.com/dotnet/runtime/pull/103821
Error Message
Fill the error message using step by step known issues guidance.
{
"ErrorMessage": "Assertion failed: (n_heaps <= heap_number) || !gc_t_join.joined()",
"ErrorPattern": "",
"BuildRetry": false,
"ExcludeConsoleLog": false
}
Report
| Build | Definition | Test | Pull Request |
|---|---|---|---|
| 722197 | dotnet/runtime | nativeaot.SmokeTests.WorkItemExecution | dotnet/runtime#103801 |
| 715849 | dotnet/runtime | nativeaot.SmokeTests.WorkItemExecution | dotnet/runtime#103821 |
Summary
| 24-Hour Hit Count | 7-Day Hit Count | 1-Month Count |
|---|---|---|
| 0 | 1 | 2 |
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.
Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.
Looks like a DATAs race condition. @dotnet/gc Could you please take a look?
Note that nativeaot\SmokeTests\Exceptions test is explicitly opted into server GC to get some coverage for server GC during default CI run.
Are there any dumps available? I can't seem to find them. Tried to repro locally to no avail. Seems like it's a low probability assertion failure (2 / month).
Are there any dumps available? I can't seem to find them. Tried to repro locally to no avail. Seems like it's a low probability assertion failure (2 / month).
Yeah, it doesn't look like infra captured a dump for this.
There are 4 hits per month but we don't have any dedicated server GC testing. This is the one and only test we run with server GC enabled. We rely on CoreCLR testing to catch GC bugs right now (even this test is not really testing Server GC - it just tests that setting the csproj property to enable server GC actually enables the server GC).
@mrsharm @MichalStrehovsky are any dumps available for this, or is there a local repro?
I couldn't locally repro this and nor could I get to any dumps. My one guess (by a long shot) is that this might be related to the other DATAS race condition we found via Reliability Framework where there is a race in the GetHeap while change_heap_count is invoked but without a dump it's difficult to validate.
The reliability framework issue was fixed correct? Looks like this issue reproed today.
The reliability framework issue was fixed correct? Looks like this issue reproed today.
It wasn't - I think we were still working on a solution. CC: @Maoni0.
ah ok. We can tag it as such then, and see if the repro stops after that is fixed.
I made a fix at https://github.com/dotnet/runtime/pull/106752.
@cshung, we should wait some time before confirming this issue has truly fixed - I am observing that the bot is still picking up the same failures.
@mrsharm, wouldn't the bot reopen it if it finds new failures? I was hoping to confirm the fix by doing that. The builds found from the bot seems to be either 9.0 or two days ago.