runtime icon indicating copy to clipboard operation
runtime copied to clipboard

Assertion failure 0 <= fd && fd < sysconf(_SC_OPEN_MAX) in System.Net.Mail.Functional.Tests

Open noahfalk opened this issue 3 years ago • 25 comments
trafficstars

Description

System.Net.Mail.Functional.Tests are failing with this assert in CI:

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-72664-merge-3f079befb6de4fac81/System.Net.Mail.Functional.Tests/1/console.08ced1c9.log?helixlogtype=result

----- start Fri 22 Jul 2022 12:11:04 PM UTC =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Mail.Functional.Tests (found 154 of 156 test cases)
  Starting:    System.Net.Mail.Functional.Tests (parallel test collections = on, max threads = 2)
    System.Net.Mail.Tests.SmtpClientTest.TestGssapiAuthentication [SKIP]
      Condition(s) not met: "IsNtlmInstalled"
dotnet: /__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed.

Reproduction Steps

Example CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab

Expected behavior

Test doesn't fail in CI

Actual behavior

Test does fail in CI, see description.

Regression?

Unknown

Known Workarounds

Unknown

Configuration

Linux Debug x64 Mono Interpreter

Other information

No response

{ "ErrorMessage":"0 <= fd && fd < sysconf(_SC_OPEN_MAX)" } 

Report

Build Definition Test
12634 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
12900 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
11388 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
11100 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
10792 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
9631 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
9299 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
9022 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
8323 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
6868 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
5726 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
5660 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
5550 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
5247 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
3424 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
1979873 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
1976601 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 17

noahfalk avatar Jul 26 '22 05:07 noahfalk

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

Description

System.Net.Mail.Functional.Tests are failing with this assert in CI:

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-72664-merge-3f079befb6de4fac81/System.Net.Mail.Functional.Tests/1/console.08ced1c9.log?helixlogtype=result

----- start Fri 22 Jul 2022 12:11:04 PM UTC =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Mail.Functional.Tests (found 154 of 156 test cases)
  Starting:    System.Net.Mail.Functional.Tests (parallel test collections = on, max threads = 2)
    System.Net.Mail.Tests.SmtpClientTest.TestGssapiAuthentication [SKIP]
      Condition(s) not met: "IsNtlmInstalled"
dotnet: /__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed.

Reproduction Steps

Example CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab

Expected behavior

Test doesn't fail in CI

Actual behavior

Test does fail in CI, see description.

Regression?

Unknown

Known Workarounds

Unknown

Configuration

Linux Debug x64 Mono Interpreter

Other information

No response

Author: noahfalk
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

msftbot[bot] avatar Jul 26 '22 05:07 msftbot[bot]

Judging by the stacktrace and the job itself it's mono-interp

EgorBo avatar Jul 26 '22 05:07 EgorBo

Tagging subscribers to this area: @brzvlad See info in area-owners.md if you want to be subscribed.

Issue Details

Description

System.Net.Mail.Functional.Tests are failing with this assert in CI:

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-72664-merge-3f079befb6de4fac81/System.Net.Mail.Functional.Tests/1/console.08ced1c9.log?helixlogtype=result

----- start Fri 22 Jul 2022 12:11:04 PM UTC =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Mail.Functional.Tests (found 154 of 156 test cases)
  Starting:    System.Net.Mail.Functional.Tests (parallel test collections = on, max threads = 2)
    System.Net.Mail.Tests.SmtpClientTest.TestGssapiAuthentication [SKIP]
      Condition(s) not met: "IsNtlmInstalled"
dotnet: /__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed.

Reproduction Steps

Example CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab

Expected behavior

Test doesn't fail in CI

Actual behavior

Test does fail in CI, see description.

Regression?

Unknown

Known Workarounds

Unknown

Configuration

Linux Debug x64 Mono Interpreter

Other information

No response

Author: noahfalk
Assignees: -
Labels:

untriaged, area-Codegen-Interpreter-mono

Milestone: -

msftbot[bot] avatar Jul 26 '22 05:07 msftbot[bot]

I pasted stacks here. It appears that the mail code, or underlying networking code, is attempting to use a file descriptor of -1, which I assume is invalid. https://github.com/dotnet/runtime/issues/72818#issuecomment-1194893236

danmoseley avatar Jul 26 '22 19:07 danmoseley

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details

Description

System.Net.Mail.Functional.Tests are failing with this assert in CI:

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-72664-merge-3f079befb6de4fac81/System.Net.Mail.Functional.Tests/1/console.08ced1c9.log?helixlogtype=result

----- start Fri 22 Jul 2022 12:11:04 PM UTC =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Mail.Functional.Tests (found 154 of 156 test cases)
  Starting:    System.Net.Mail.Functional.Tests (parallel test collections = on, max threads = 2)
    System.Net.Mail.Tests.SmtpClientTest.TestGssapiAuthentication [SKIP]
      Condition(s) not met: "IsNtlmInstalled"
dotnet: /__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed.

Reproduction Steps

Example CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab

Expected behavior

Test doesn't fail in CI

Actual behavior

Test does fail in CI, see description.

Regression?

Unknown

Known Workarounds

Unknown

Configuration

Linux Debug x64 Mono Interpreter

Other information

No response

Author: noahfalk
Assignees: -
Labels:

area-System.Net, blocking-clean-ci, untriaged

Milestone: -

msftbot[bot] avatar Jul 26 '22 19:07 msftbot[bot]

cc: @tmds

seems like we do closing magic on invalid socket ... (stack bellow is copy from @danmoseley post)

top of stack looks like

#10 0x00007f4c7b4ab5cd in ToFileDescriptor (fd=-1) at /__w/1/s/src/native/libs/Common/pal_utilities.h:86
#11 0x00007f4c7b4abf95 in SystemNative_FcntlGetFD (fd=-1) at /__w/1/s/src/native/libs/System.Native/pal_io.c:611
...
	  at <unknown> <0xffffffff>
	  at Fcntl:<GetFD>g____PInvoke|5_0 <0x00020>
	  at Fcntl:GetFD <0x00020>
	  at System.Net.Sockets.SafeSocketHandle:TryUnblockSocket <0x0003a>
	  at System.Net.Sockets.SafeSocketHandle:CloseAsIs <0x000f4>
	  at System.Net.Sockets.Socket:Dispose <0x00426>
	  at System.Net.Sockets.Socket:Dispose <0x000a4>
	  at System.Net.Sockets.Socket:Close <0x00098>
	  at System.Net.Sockets.TcpClient:Dispose <0x0011c>
	  at System.Net.Sockets.TcpClient:Dispose <0x0001a>
	  at System.Net.Mail.SmtpConnection:ShutdownConnection <0x00184>
	  at System.Net.Mail.SmtpConnection:Abort <0x00012>
	  at System.Net.Mail.SmtpTransport:Abort <0x00078>
	  at System.Net.Mail.SmtpClient:Abort <0x0001c>
	  at System.Net.Mail.SmtpClient:SendAsyncCancel <0x00088>
	  at <>c:<SendMailAsync>b__84_1 <0x0001c>
	  at System.Threading.CancellationTokenSource:Invoke <0x00042>
or

#9  0x00007fbc510f8102 in __GI___assert_fail (assertion=0x7fbc4dcdc8a6 "0 <= fd && fd < sysconf(_SC_OPEN_MAX)", file=0x7fbc4dcddbb1 "/__w/1/s/src/native/libs/Common/pal_utilities.h", line=86, function=0x7fbc4dcdd527 "int ToFileDescriptor(intptr_t)") at assert.c:101
#10 0x00007fbc4dceb7cd in ToFileDescriptor (fd=-1) at /__w/1/s/src/native/libs/Common/pal_utilities.h:86
#11 0x00007fbc4dcebd8e in SystemNative_SetLingerOption (socket=-1, option=0x7fbc45d84d18) at /__w/1/s/src/native/libs/System.Native/pal_networking.c:1278
...
	  at <unknown> <0xffffffff>
	  at Sys:<SetLingerOption>g____PInvoke|34_0 <0x00024>
	  at Sys:SetLingerOption <0x00068>
	  at System.Net.Sockets.SocketPal:SetLingerOption <0x00092>
	  at System.Net.Sockets.Socket:SetLingerOption <0x00022>
	  at System.Net.Sockets.Socket:SetSocketOption <0x00248>
	  at System.Net.Sockets.Socket:set_LingerState <0x00024>
	  at System.Net.Sockets.TcpClient:set_LingerState <0x00022>
	  at System.Net.Mail.SmtpConnection:ShutdownConnection <0x000f0>
	  at System.Net.Mail.SmtpConnection:Abort <0x00012>
	  at System.Net.Mail.SmtpTransport:Abort <0x00078>
	  at System.Net.Mail.SmtpClient:Abort <0x0001c>
	  at System.Net.Mail.SmtpClient:SendAsyncCancel <0x00088>
	  at <>c:<SendMailAsync>b__84_1 <0x0001c>
	  at System.Threading.CancellationTokenSource:Invoke <0x00042>

wfurt avatar Jul 26 '22 19:07 wfurt

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details

Description

System.Net.Mail.Functional.Tests are failing with this assert in CI:

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-72664-merge-3f079befb6de4fac81/System.Net.Mail.Functional.Tests/1/console.08ced1c9.log?helixlogtype=result

----- start Fri 22 Jul 2022 12:11:04 PM UTC =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Mail.Functional.Tests (found 154 of 156 test cases)
  Starting:    System.Net.Mail.Functional.Tests (parallel test collections = on, max threads = 2)
    System.Net.Mail.Tests.SmtpClientTest.TestGssapiAuthentication [SKIP]
      Condition(s) not met: "IsNtlmInstalled"
dotnet: /__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed.

Reproduction Steps

Example CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab

Expected behavior

Test doesn't fail in CI

Actual behavior

Test does fail in CI, see description.

Regression?

Unknown

Known Workarounds

Unknown

Configuration

Linux Debug x64 Mono Interpreter

Other information

No response

Author: noahfalk
Assignees: -
Labels:

area-System.Net.Sockets, blocking-clean-ci, untriaged

Milestone: -

msftbot[bot] avatar Jul 26 '22 19:07 msftbot[bot]

@rzikm can you please check how often it happens? Thanks!

karelz avatar Aug 03 '22 11:08 karelz

Very often, 98 hits in the last 14 days. Curiously, none of these are on main

rzikm avatar Aug 03 '22 14:08 rzikm

Aside from some authentication changes, #70046 would be biggest suspect. It may not be root cause as the assert is in Sockets. I tried to reproduce it (main on Linux) but I did not get hit. We can probably look at some of the Linux/Windows core files to see what particular tests are running.

wfurt avatar Aug 03 '22 17:08 wfurt

If it is happening that often perhaps we should disable the test for now to avoid noise in CI ... @rzikm thoughts?

karelz avatar Aug 07 '22 15:08 karelz

I think it was the same test we disabled in #73452 (SendMailAsync_CanBeCanceled_CancellationToken)

rzikm avatar Aug 08 '22 07:08 rzikm

Update: We are re-enabling the test in main via PR #74545 with changes that will hopefully give us more info on the root cause.

karelz avatar Aug 25 '22 16:08 karelz

BadExit may be misleading in queries... This is not event PR but main

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-ececf5ef6b4c4ff2a6/System.Net.Mail.Functional.Tests/1/console.8759b633.log?helixlogtype=result

----- start Thu Aug 25 12:40:46 UTC 2022 =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=IgnoreForCI -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
./RunTests.sh: line 168: /root/helix/work/correlation/dotnet: No such file or directory
/root/helix/work/workitem/e

wfurt avatar Aug 26 '22 06:08 wfurt

@wfurt I found out that you need to specifically filter for 134 and 139 exit codes to get true process crashes

rzikm avatar Aug 26 '22 07:08 rzikm

ok. I updated my query. My original one was based on the runfo issues. So far so good.

wfurt avatar Aug 26 '22 07:08 wfurt

Not blocking CI as the test was disabled in #73452

karelz avatar Aug 26 '22 14:08 karelz

I font understand the comment @karelz. The suspected test was enabled again in https://github.com/dotnet/runtime/pull/74545 I did not see occurrences since. (still may come)

wfurt avatar Aug 26 '22 18:08 wfurt

Sure, but right now it is not blocking CI

karelz avatar Aug 26 '22 18:08 karelz

Status: After re-enabling the test, we got some hits on PRs. @rzikm has actionable dump link.

karelz avatar Aug 30 '22 16:08 karelz

this is from https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-74639-merge-d6f8b011644f4edc84/System.Net.Mail.Functional.Tests/1/console.694b6d7a.log?helixlogtype=result

(lldb) clrstack -a
OS Thread Id: 0xfe8 (1)
        Child SP               IP Call Site
00007F1D4AAD3008 00007f5e86cf9387 [InlinedCallFrame: 00007f1d4aad3008] Interop+Sys.<SetLingerOption>g____PInvoke|34_0(IntPtr, LingerOption*)
00007F1D4AAD3008 00007f5e08a331ed [InlinedCallFrame: 00007f1d4aad3008] Interop+Sys.<SetLingerOption>g____PInvoke|34_0(IntPtr, LingerOption*)
00007F1D4AAD3000 00007F5E08A331ED ILStubClass.IL_STUB_PInvoke(IntPtr, LingerOption*)
    PARAMETERS:
        <no data>
        <no data>

00007F1D4AAD30D0 00007F5E08D61E09 Interop+Sys.SetLingerOption(System.Runtime.InteropServices.SafeHandle, LingerOption*) [/_/src/libraries/System.Net.Sockets/src/Microsoft.Interop.LibraryImportGenerator/Microsoft.Interop.LibraryImportGenerator/LibraryImports.g.cs @ 772]
    PARAMETERS:
        socket (0x00007F1D4AAD3118) = 0x00007f1df583ad88
        option (0x00007F1D4AAD3110) = 0x00007f1d4aad3188
    LOCALS:
        0x00007F1D4AAD3108 = 0xffffffffffffffff
        0x00007F1D4AAD3104 = 0xffffffff00007f5e
        0x00007F1D4AAD30F8 = 0x0000000000000001
        0x00007F1D4AAD30F4 = 0x0000000000000000
        0x00007F1D4AAD30F0 = 0x0000000000000000


00007F1D4AAD3130 00007F5E08D61CF0 System.Net.Sockets.SocketPal.SetLingerOption(System.Net.Sockets.SafeSocketHandle, System.Net.Sockets.LingerOption) [/_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs @ 1534]
    PARAMETERS:
        handle (0x00007F1D4AAD3198) = 0x00007f1df583ad88
        optionValue (0x00007F1D4AAD3190) = 0x00007f1df584a790
    LOCALS:
        0x00007F1D4AAD3188 = 0x0000000000000001
        0x00007F1D4AAD3184 = 0x0000000100000000
        0x00007F1D4AAD3178 = 0x0000000000000001
        0x00007F1D4AAD3174 = 0x0000000100007f1d

when I dump the SafeHandle, it has reasonable value...

=============================================================================
(lldb) dumpobj 0x7f1df583ad88
Name:        System.Net.Sockets.SafeSocketHandle
MethodTable: 00007f5e08a797f8
EEClass:     00007f5e08a82d00
Size:        80(0x50) bytes
File:        /mnt/work/B1FC09CD/p/shared/Microsoft.NETCore.App/8.0.0/System.Net.Sockets.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007f5e07bc9a80  4001127        8        System.IntPtr  1 instance 00000000000000A3 handle
00007f5e07b3f0f0  4001128       10         System.Int32  1 instance                8 _state
00007f5e07b3bbf0  4001129       14       System.Boolean  1 instance                1 _ownsHandle
00007f5e07b3bbf0  400112a       15       System.Boolean  1 instance                1 _fullyInitialized
00007f5e08a78c20  4000106       20         System.Int32  1 instance       -559038737 _closeSocketResult
00007f5e08a78c20  4000107       24         System.Int32  1 instance       -559038737 _closeSocketLinger
00007f5e07b3f0f0  4000108       28         System.Int32  1 instance                0 _closeSocketThread
00007f5e07b3f0f0  4000109       2c         System.Int32  1 instance                0 _closeSocketTick
00007f5e07b3f0f0  400010a       30         System.Int32  1 instance                0 _ownClose
00007f5e07b3bbf0  400010b       3c       System.Boolean  1 instance                1 <OwnsHandle>k__BackingField
00007f5e07b3bbf0  400010c       3d       System.Boolean  1 instance                0 _released
00007f5e07b3bbf0  400010d       3e       System.Boolean  1 instance                0 _hasShutdownSend
00007f5e07b3f0f0  400010e       34         System.Int32  1 instance               -1 _receiveTimeout
00007f5e07b3f0f0  400010f       38         System.Int32  1 instance               -1 _sendTimeout
00007f5e07b3bbf0  4000110       3f       System.Boolean  1 instance                0 _nonBlocking
00007f5e08aae4b0  4000111       18 ...ocketAsyncContext  0 instance 00007f1df583aef8 _asyncContext
00007f5e08a796a0  4000112       16         System.Int16  1 instance                2 _trackedOptions
00007f5e07b3bbf0  4000113       40       System.Boolean  1 instance                0 <LastConnectFailed>k__BackingField
00007f5e07b3bbf0  4000114       41       System.Boolean  1 instance                1 <DualMode>k__BackingField
00007f5e07b3bbf0  4000115       42       System.Boolean  1 instance                0 <ExposedHandleOrUntrackedConfiguration>k__BackingField
00007f5e07b3bbf0  4000116       43       System.Boolean  1 instance                0 <PreferInlineCompletions>k__BackingField
00007f5e07b3bbf0  4000117       44       System.Boolean  1 instance                1 <IsSocket>k__BackingField
00007f5e07b3bbf0  4000118       45       System.Boolean  1 instance                0 <IsDisconnected>k__BackingField

any idea how that becomes invalid @jkotas ? I could not confirm the -1 above but the first threads looks suspicious

(lldb) thread select 1
* thread #1, name = 'dotnet', stop reason = signal SIGABRT
    frame #0: 0x00007f5e86cf9387
->  0x7f5e86cf9387: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f5e86cf938d: ja     0x7f5e86cf93ad
    0x7f5e86cf938f: rep    retq
    0x7f5e86cf9391: nopl   (%rax)
(lldb) bt
* thread #1, name = 'dotnet', stop reason = signal SIGABRT
  * frame #0: 0x00007f5e86cf9387

since this was last reported on macOS, it seems unlikely related to #73972.

wfurt avatar Aug 30 '22 19:08 wfurt

@AaronRobinsonMSFT Could you please take a look?

The SafeHandle.handle is a good value (00000000000000A3), but it somehow turned into -1 by the time it got into the P/Invoke.

jkotas avatar Aug 30 '22 21:08 jkotas

since this was last reported on macOS

I can't seem to reproduce this on an M1. I will try Linux-x64 next.

AaronRobinsonMSFT avatar Aug 31 '22 21:08 AaronRobinsonMSFT

We spent days with @rzikm to reproduce it without any luck @AaronRobinsonMSFT. We really only have some dumps from CI.

wfurt avatar Aug 31 '22 22:08 wfurt

Got another repro on RedHat.7 https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-75287-merge-d8d21444e6d84e278f/System.Net.Mail.Functional.Tests/1/console.30b0113a.log?helixlogtype=result

dump is here.

wfurt avatar Sep 09 '22 18:09 wfurt

https://github.com/dotnet/runtime/blob/960e4d723c27a5407dc691d06e546bc455a9c4a5/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.Unix.cs#L139-L140

I think, this is the root cause of this issue. Because it's the only place that we can set the handle to -1 temporarily (via CreateSocket) outside of the Socket constructor. In this case, we're trying to close the socket (or set linger option) while trying to replace the handle.

SafeSocketHandle oldHandle = _handle;
SafeSocketHandle newHandle;
SocketError errorCode = SocketPal.CreateSocket(_addressFamily, _socketType, _protocolType, out newHandle);
_handle = newHandle;

Something like this should fix it.

liveans avatar Oct 16 '22 16:10 liveans

I can't quite see how that fixes it. It's an out parameter, so how is what you wrote different to the existing code?

danmoseley avatar Oct 16 '22 20:10 danmoseley

The problem is race condition, actually. At the beginning of the CreateSocket function we have something like this:

https://github.com/dotnet/runtime/blob/960e4d723c27a5407dc691d06e546bc455a9c4a5/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs#L59

Which means we're replacing the current Socket instance's handle with default constructed SafeSocketHandle, at this point we have a chance to hit -1 as handle's file descriptor if we're running code in multi-thread/async environment.

In the same function we have another line to update file descriptor:

https://github.com/dotnet/runtime/blob/960e4d723c27a5407dc691d06e546bc455a9c4a5/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs#L91

Edit: I deleted the wrong information

liveans avatar Oct 16 '22 20:10 liveans

Ah, I didn't realize it's multithreaded.

danmoseley avatar Oct 17 '22 02:10 danmoseley

Where is the code in SendAsync @liveans? I only saw some replacement during connect. I think this can still be a problem in cancellation & cleanup kicks in on thread pool. I would be probably worth of trying proposed change.

wfurt avatar Oct 18 '22 01:10 wfurt