runtime
runtime copied to clipboard
[libraries-pgo] Issues related to overlapped IO
Multiple failures related to overlapped IO in net7.0-windows-Release-x64-CoreCLR_checked-jitosr_stress_random.
Work item link: https://dev.azure.com/dnceng-public/public/_build/results?buildId=21469&view=ms.vss-test-web.build-test-results-tab&runId=432776&paneView=debug
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.NetworkInformation.Functional.Tests/1/console.cc044db2.log?helixlogtype=result
Unhandled exception. System.InvalidOperationException: 'overlapped' has already been freed.
at System.Threading.ThreadPoolBoundHandleOverlapped.CompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolBoundHandleOverlapped.cs:line 49
at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.Requests.Tests/1/console.7662a692.log?helixlogtype=result
Unhandled exception. System.InvalidOperationException: Handle is not initialized.
at System.Threading.Overlapped.GetOverlappedFromNative(NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Overlapped.cs:line 228
at System.Threading.IOCompletionCallbackHelper.PerformSingleIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/IOCompletionCallbackHelper.cs:line 37
at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77
Also, there are several AVs inside coreclr.dll and uses of corrupted OBJECTREFs in the same scenario.
Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.
Issue Details
Multiple failures related to overlapped IO in net7.0-windows-Release-x64-CoreCLR_checked-jitosr_stress_random.
Work item link: https://dev.azure.com/dnceng-public/public/_build/results?buildId=21469&view=ms.vss-test-web.build-test-results-tab&runId=432776&paneView=debug
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.NetworkInformation.Functional.Tests/1/console.cc044db2.log?helixlogtype=result
Unhandled exception. System.InvalidOperationException: 'overlapped' has already been freed.
at System.Threading.ThreadPoolBoundHandleOverlapped.CompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolBoundHandleOverlapped.cs:line 49
at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.Requests.Tests/1/console.7662a692.log?helixlogtype=result
Unhandled exception. System.InvalidOperationException: Handle is not initialized.
at System.Threading.Overlapped.GetOverlappedFromNative(NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Overlapped.cs:line 228
at System.Threading.IOCompletionCallbackHelper.PerformSingleIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/IOCompletionCallbackHelper.cs:line 37
at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77
Also, there are several AVs inside coreclr.dll and uses of corrupted OBJECTREFs in the same scenario.
Author: | jakobbotsch |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Issue Details
Multiple failures related to overlapped IO in net7.0-windows-Release-x64-CoreCLR_checked-jitosr_stress_random.
Work item link: https://dev.azure.com/dnceng-public/public/_build/results?buildId=21469&view=ms.vss-test-web.build-test-results-tab&runId=432776&paneView=debug
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.NetworkInformation.Functional.Tests/1/console.cc044db2.log?helixlogtype=result
Unhandled exception. System.InvalidOperationException: 'overlapped' has already been freed.
at System.Threading.ThreadPoolBoundHandleOverlapped.CompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolBoundHandleOverlapped.cs:line 49
at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.Requests.Tests/1/console.7662a692.log?helixlogtype=result
Unhandled exception. System.InvalidOperationException: Handle is not initialized.
at System.Threading.Overlapped.GetOverlappedFromNative(NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Overlapped.cs:line 228
at System.Threading.IOCompletionCallbackHelper.PerformSingleIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/IOCompletionCallbackHelper.cs:line 37
at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77
Also, there are several AVs inside coreclr.dll and uses of corrupted OBJECTREFs in the same scenario.
Author: | jakobbotsch |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
cc @dotnet/jit-contrib
I'll take a look.
For win-x64, issue seems to be related to codegen in this method (via Complus_JitEnablePatchpointRange
isolation)
System.Reflection.PortableExecutable.PEHeaders:TryGetDirectoryOffset(System.Reflection.PortableExecutable.DirectoryEntry,byref,bool):bool:this [Tier1-OSR @0x5e, IL size=136, code size=125, hash=0xd9042efb]
Hmm, that method may not be buggy though -- it is correlated with failure but not the cause of failure. I think it only gets called after the exception is raised.
So the root cause may not be bad codegen from OSR. Still digging.
Another possibility is that the tier0 code with patchpoints is somehow buggy. Further isolation suggests this is the likely culprit:
SocketDelegateHelper:ConnectEx(System.Net.Sockets.SafeSocketHandle,long,int,long,int,byref,ulong):bool:this [Tier0, IL size=98, code size=587, hash=0xd9557380]
Guessing the issue is that we insert a patchpoint callout between a pinvoke and fetching of last error:
G_M35967_IG06: ;; offset=011CH
FFD0 call rax
;; size=2 bbWeight=1 PerfScore 3.00
G_M35967_IG07: ;; offset=011EH
488B5580 mov rdx, qword ptr [rbp-80H]
C6420C01 mov byte ptr [rdx+0CH], 1
833D6B184B5E00 cmp dword ptr [(reloc 0x7ffcc493bd08)], 0
7406 je SHORT G_M35967_IG08
FF154B69495E call [CORINFO_HELP_STOP_FOR_GC]
;; size=23 bbWeight=1 PerfScore 9.00
G_M35967_IG08: ;; offset=0135H
488B4D80 mov rcx, qword ptr [rbp-80H]
488B9540FFFFFF mov rdx, bword ptr [rbp-C0H]
48895110 mov qword ptr [rcx+10H], rdx
8945B8 mov dword ptr [rbp-48H], eax
;; size=18 bbWeight=1 PerfScore 4.00
G_M35967_IG09: ;; offset=0147H
488BCC mov rcx, rsp
E87B000000 call G_M35967_IG21
;; size=8 bbWeight=1 PerfScore 1.25
G_M35967_IG10: ;; offset=014FH
90 nop
;; size=1 bbWeight=1 PerfScore 0.25
G_M35967_IG11: ;; offset=0150H
8B4D90 mov ecx, dword ptr [rbp-70H]
FFC9 dec ecx
894D90 mov dword ptr [rbp-70H], ecx
837D9000 cmp dword ptr [rbp-70H], 0
7F0E jg SHORT G_M35967_IG13
;; size=14 bbWeight=1 PerfScore 5.25
G_M35967_IG12: ;; offset=015EH
488D4D90 lea rcx, [rbp-70H]
BA45000000 mov edx, 69
E8E49BB95D call CORINFO_HELP_PATCHPOINT
;; size=14 bbWeight=0.01 PerfScore 0.02
G_M35967_IG13: ;; offset=016CH
FF15B64C93FE call [System.Runtime.InteropServices.Marshal:GetLastSystemError():int]
and so the last error state is corrupted.
Normally we wouldn't put a patchpoint here as this is not a loop backedge source or target, a so this is (currently) an OSR stress mode only problem.
Fix is either:
- avoid doing instrumentation in these places (seems tricky)
- ensure that last error is preserved (fairly straightforward in cases where the helper returns, but less obvious if the helper transitions -- maybe we are lucky and
ClrRestoreNonvolatileContext
or similar does not corrupt last error?
Seems like EnC could have similar problems preserving last error, especially if the interop code is source generated and there are user-accessible breakpoints between the pinvoke call and the last error handling.