AsyncIO
AsyncIO copied to clipboard
Could it be a potential issue of AsyncSocket?
I have been using AsyncIO to write TCP applications for quite sometime. However, I encountered weird behaviors when the workload went a bit heavy, about several thousands of concurrent connections at a time.
I then studied the source code and deduct the possible production procedures of the issue:
- Called the
Send
orReceive
method of theAsyncSocket
(via the Windows implementation, the one that P/Invoke winsocks). -
Dispose
theAsyncSocket
before the overlapped operation is completed.- The
CancelIoEx
is called, the cancellation is send on its way. -
closesocket
is called. - The
Dispose
(Free
) ofm_receivePinnedBuffer
andm_sendPinnedBuffer
will be called.
- The
- GC reclaims the memory assigned to
AsyncSocket
.
According to the documetation of CancelIoEx
:
Most types of operations can be canceled immediately; other operations can continue toward completion before they are actually canceled and the caller is notified. The CancelIoEx function does not wait for all canceled operations to complete. ...
The operation being canceled is completed with one of three statuses; you must check the completion status to determine the completion state. The three statuses are:
- The operation completed normally. This can occur even if the operation was canceled, because the cancel request might not have been submitted in time to cancel the operation.
- The operation was canceled. The GetLastError function returns ERROR_OPERATION_ABORTED.
- The operation failed with another error. The GetLastError function returns the relevant error code.
Thus, theoretically, the overlapped operation may still complete after the Dispose
of AsyncSocket
is called and write data back to the buffer that m_receivePinnedBuffer
points to.
In most cases, it's OK since the aforementioned buffer is pointing to a managed byte array, which is just unpinned by the Free
method and it usually may still be there.
However, if GC happens before the write-back, things can go wrong since the buffers have already been unpinned and can be moved by GC.
A rarer case is that the GC occurs when an overlapped sending operation and incorrect bytes will be read by the outgoing sender, before the cancellation is executed.
How do you think about the above situation? Please correct me if I am wrong.
It is making sense. Any idea regarding a solution? Can you try comment out the cancelioex call and check if the problem still exist? I'm actually not sure we need the cancellation, however we will need a small state machine for dispose and actually free the memory correctly.
Thank you for replying.
At this moment, I think immersing the PinnedBuffer
into Overlapped
and disposing the buffer along with the Overlapped
may be the way to go, since the buffer is used by the Overlapped
only and it can not be unpinned before the overlapped operation is finished.
Can you make a PR?
Sure.
I'd previously posted a PR with some minor changes. Is it appropriate to merge it?
The leakage problem is not yet fully solved. Sometimes Overlap
instances can not be freed.
I studied the source code closer and found that the sequence of statements in Dispose
of AsyncSocket
were so critical.
Currently there were four critical statements in the above method:
m_inOverlapped.Dispose();
m_outOverlapped.Dispose();
UnsafeMethods.CancelIoEx(Handle, IntPtr.Zero);
UnsafeMethods.closesocket(Handle);
When the Overlapped
instances are disposed, it is very probable that overlap operations such as receiving or sending are outstanding. Thus the Free
method inside Overlapped
will not be called.
When CancelIoEx
is called, the overlapped operations will be canceled and the Overlapped.CompleteOperation
MAY be executed, if the cancellation is in time.
When closesocket
is called, according to MSDN:
The closesocket function will initiate cancellation on the outstanding I/O operations, but that does not mean that an application will receive I/O completion for these I/O operations by the time the
closesocket
function returns. Thus, an application should not cleanup any resources (WSAOVERLAPPED
structures, for example) referenced by the outstanding I/O requests until the I/O requests are indeed completed.
Since we rely on the completion of Overlapped.CompleteOperation
to cleanup the resource consumed by Overlapped
, but if the overlapped operation is canceled by closesocket
, the CompleteOperation
might NOT be called and thus the resource can't be reclaimed.
If I placed a break point onto the CancelIoEx
, and stepped over it, the CompleteOperation
was called. But when no break point was there, and the completion port was still alive, sometimes the CompleteOperation
was not called. Some overlapped operations ended without any notification.
We currently not checking the return value and last error of cancelioex, maybe this is the direction?
On Tue, Nov 6, 2018, 11:16 wmjordan <[email protected] wrote:
If I placed a break point onto the CancelIoEx, and stepped over it, the CompleteOperation was called. But when no break point was there, and the completion port was still alive, sometimes the CompleteOperation was not called. Some overlapped operations ended without any notification.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/somdoron/AsyncIO/issues/26#issuecomment-436182885, or mute the thread https://github.com/notifications/unsubscribe-auth/AClv9udYLVUYhIVI93g4SSp_llK5gYxyks5usVNugaJpZM4YMy2L .
I added code to check the error code of CancelIoEx
. It could return true
but the CompleteOperation
was still not called later.
I also tried to move the Dispose
of Overlapped
instances from before CancelIoEx
to behind. It seemed that the CompleteOperation
was missed less, but sometimes it just missed.
I will try more and return to you later.
I could not yet find the reason why CompleteOperation
could not be fired. I eventually created a class which ran thread and threw Overlapped
instances which was not handled by CompleteOperation
to that thread to destroy those instances after some time.
I recently read the documentation of CancelIoEx
. It said that A Winsock client must never issue closesocket on s concurrently with another Winsock function call. Maybe it was potentially leading to another problem that I disposed an AsyncSocket
when it was sending or connecting.
Today I spent a whole day diagnosing this issue. I found a weird thing, which could even happen in the AsyncIO.Tests project. I ran the SendReceive
test method.
And placed a breakpoint at the call to CancelIoEx
in Socket.cs.
I found that when listener
, serverSocket
and clientSocket
were disposing, the return values of CancelIoEx
were all false
, and I ran subsequent Marshal.GetLastWin32Error()
via the immediate window of the debugger, after each call to CancelIoEx
, the return values were all 1168
, which indicated the corresponding asynchronous IO operations were not found.
Is it the behavior of CancelIoEx
or are we missing something?