runtime icon indicating copy to clipboard operation
runtime copied to clipboard

[libraries-pgo] Issues related to overlapped IO

Open jakobbotsch opened this issue 2 years ago • 4 comments

Multiple failures related to overlapped IO in net7.0-windows-Release-x64-CoreCLR_checked-jitosr_stress_random.

Work item link: https://dev.azure.com/dnceng-public/public/_build/results?buildId=21469&view=ms.vss-test-web.build-test-results-tab&runId=432776&paneView=debug


https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.NetworkInformation.Functional.Tests/1/console.cc044db2.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: 'overlapped' has already been freed.
   at System.Threading.ThreadPoolBoundHandleOverlapped.CompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolBoundHandleOverlapped.cs:line 49
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.Requests.Tests/1/console.7662a692.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: Handle is not initialized.
   at System.Threading.Overlapped.GetOverlappedFromNative(NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Overlapped.cs:line 228
   at System.Threading.IOCompletionCallbackHelper.PerformSingleIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/IOCompletionCallbackHelper.cs:line 37
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

Also, there are several AVs inside coreclr.dll and uses of corrupted OBJECTREFs in the same scenario.

jakobbotsch avatar Sep 19 '22 09:09 jakobbotsch

Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.

Issue Details

Multiple failures related to overlapped IO in net7.0-windows-Release-x64-CoreCLR_checked-jitosr_stress_random.

Work item link: https://dev.azure.com/dnceng-public/public/_build/results?buildId=21469&view=ms.vss-test-web.build-test-results-tab&runId=432776&paneView=debug


https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.NetworkInformation.Functional.Tests/1/console.cc044db2.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: 'overlapped' has already been freed.
   at System.Threading.ThreadPoolBoundHandleOverlapped.CompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolBoundHandleOverlapped.cs:line 49
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.Requests.Tests/1/console.7662a692.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: Handle is not initialized.
   at System.Threading.Overlapped.GetOverlappedFromNative(NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Overlapped.cs:line 228
   at System.Threading.IOCompletionCallbackHelper.PerformSingleIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/IOCompletionCallbackHelper.cs:line 37
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

Also, there are several AVs inside coreclr.dll and uses of corrupted OBJECTREFs in the same scenario.

Author: jakobbotsch
Assignees: -
Labels:

area-System.Threading, untriaged

Milestone: -

ghost avatar Sep 19 '22 09:09 ghost

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

Multiple failures related to overlapped IO in net7.0-windows-Release-x64-CoreCLR_checked-jitosr_stress_random.

Work item link: https://dev.azure.com/dnceng-public/public/_build/results?buildId=21469&view=ms.vss-test-web.build-test-results-tab&runId=432776&paneView=debug


https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.NetworkInformation.Functional.Tests/1/console.cc044db2.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: 'overlapped' has already been freed.
   at System.Threading.ThreadPoolBoundHandleOverlapped.CompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolBoundHandleOverlapped.cs:line 49
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.Requests.Tests/1/console.7662a692.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: Handle is not initialized.
   at System.Threading.Overlapped.GetOverlappedFromNative(NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Overlapped.cs:line 228
   at System.Threading.IOCompletionCallbackHelper.PerformSingleIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/IOCompletionCallbackHelper.cs:line 37
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

Also, there are several AVs inside coreclr.dll and uses of corrupted OBJECTREFs in the same scenario.

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

ghost avatar Sep 19 '22 09:09 ghost

cc @dotnet/jit-contrib

jakobbotsch avatar Sep 19 '22 09:09 jakobbotsch

I'll take a look.

AndyAyersMS avatar Sep 20 '22 15:09 AndyAyersMS

For win-x64, issue seems to be related to codegen in this method (via Complus_JitEnablePatchpointRange isolation)

System.Reflection.PortableExecutable.PEHeaders:TryGetDirectoryOffset(System.Reflection.PortableExecutable.DirectoryEntry,byref,bool):bool:this [Tier1-OSR @0x5e, IL size=136, code size=125, hash=0xd9042efb]

Hmm, that method may not be buggy though -- it is correlated with failure but not the cause of failure. I think it only gets called after the exception is raised.

So the root cause may not be bad codegen from OSR. Still digging.

AndyAyersMS avatar Sep 20 '22 16:09 AndyAyersMS

Another possibility is that the tier0 code with patchpoints is somehow buggy. Further isolation suggests this is the likely culprit:

 SocketDelegateHelper:ConnectEx(System.Net.Sockets.SafeSocketHandle,long,int,long,int,byref,ulong):bool:this [Tier0, IL size=98, code size=587, hash=0xd9557380]

AndyAyersMS avatar Sep 20 '22 18:09 AndyAyersMS

Guessing the issue is that we insert a patchpoint callout between a pinvoke and fetching of last error:

G_M35967_IG06:              ;; offset=011CH
       FFD0                 call     rax
						;; size=2 bbWeight=1    PerfScore 3.00
G_M35967_IG07:              ;; offset=011EH
       488B5580             mov      rdx, qword ptr [rbp-80H]
       C6420C01             mov      byte  ptr [rdx+0CH], 1
       833D6B184B5E00       cmp      dword ptr [(reloc 0x7ffcc493bd08)], 0
       7406                 je       SHORT G_M35967_IG08
       FF154B69495E         call     [CORINFO_HELP_STOP_FOR_GC]
						;; size=23 bbWeight=1    PerfScore 9.00
G_M35967_IG08:              ;; offset=0135H
       488B4D80             mov      rcx, qword ptr [rbp-80H]
       488B9540FFFFFF       mov      rdx, bword ptr [rbp-C0H]
       48895110             mov      qword ptr [rcx+10H], rdx
       8945B8               mov      dword ptr [rbp-48H], eax
						;; size=18 bbWeight=1    PerfScore 4.00
G_M35967_IG09:              ;; offset=0147H
       488BCC               mov      rcx, rsp
       E87B000000           call     G_M35967_IG21
						;; size=8 bbWeight=1    PerfScore 1.25
G_M35967_IG10:              ;; offset=014FH
       90                   nop      
						;; size=1 bbWeight=1    PerfScore 0.25
G_M35967_IG11:              ;; offset=0150H
       8B4D90               mov      ecx, dword ptr [rbp-70H]
       FFC9                 dec      ecx
       894D90               mov      dword ptr [rbp-70H], ecx
       837D9000             cmp      dword ptr [rbp-70H], 0
       7F0E                 jg       SHORT G_M35967_IG13
						;; size=14 bbWeight=1    PerfScore 5.25
G_M35967_IG12:              ;; offset=015EH
       488D4D90             lea      rcx, [rbp-70H]
       BA45000000           mov      edx, 69
       E8E49BB95D           call     CORINFO_HELP_PATCHPOINT
						;; size=14 bbWeight=0.01 PerfScore 0.02
G_M35967_IG13:              ;; offset=016CH
       FF15B64C93FE         call     [System.Runtime.InteropServices.Marshal:GetLastSystemError():int]

and so the last error state is corrupted.

Normally we wouldn't put a patchpoint here as this is not a loop backedge source or target, a so this is (currently) an OSR stress mode only problem.

Fix is either:

  • avoid doing instrumentation in these places (seems tricky)
  • ensure that last error is preserved (fairly straightforward in cases where the helper returns, but less obvious if the helper transitions -- maybe we are lucky and ClrRestoreNonvolatileContext or similar does not corrupt last error?

AndyAyersMS avatar Sep 20 '22 18:09 AndyAyersMS

Seems like EnC could have similar problems preserving last error, especially if the interop code is source generated and there are user-accessible breakpoints between the pinvoke call and the last error handling.

AndyAyersMS avatar Sep 20 '22 19:09 AndyAyersMS