AsyncIO icon indicating copy to clipboard operation
AsyncIO copied to clipboard

Could it be a potential issue of AsyncSocket?

Open wmjordan opened this issue 6 years ago • 11 comments

I have been using AsyncIO to write TCP applications for quite sometime. However, I encountered weird behaviors when the workload went a bit heavy, about several thousands of concurrent connections at a time.

I then studied the source code and deduct the possible production procedures of the issue:

  1. Called the Send or Receive method of the AsyncSocket (via the Windows implementation, the one that P/Invoke winsocks).
  2. Dispose the AsyncSocket before the overlapped operation is completed.
    1. The CancelIoEx is called, the cancellation is send on its way.
    2. closesocket is called.
    3. The Dispose (Free) of m_receivePinnedBuffer and m_sendPinnedBuffer will be called.
  3. GC reclaims the memory assigned to AsyncSocket.

According to the documetation of CancelIoEx:

Most types of operations can be canceled immediately; other operations can continue toward completion before they are actually canceled and the caller is notified. The CancelIoEx function does not wait for all canceled operations to complete. ...

The operation being canceled is completed with one of three statuses; you must check the completion status to determine the completion state. The three statuses are:

  1. The operation completed normally. This can occur even if the operation was canceled, because the cancel request might not have been submitted in time to cancel the operation.
  2. The operation was canceled. The GetLastError function returns ERROR_OPERATION_ABORTED.
  3. The operation failed with another error. The GetLastError function returns the relevant error code.

Thus, theoretically, the overlapped operation may still complete after the Dispose of AsyncSocket is called and write data back to the buffer that m_receivePinnedBuffer points to.

In most cases, it's OK since the aforementioned buffer is pointing to a managed byte array, which is just unpinned by the Free method and it usually may still be there.

However, if GC happens before the write-back, things can go wrong since the buffers have already been unpinned and can be moved by GC.

A rarer case is that the GC occurs when an overlapped sending operation and incorrect bytes will be read by the outgoing sender, before the cancellation is executed.

How do you think about the above situation? Please correct me if I am wrong.

wmjordan avatar Nov 03 '18 13:11 wmjordan

It is making sense. Any idea regarding a solution? Can you try comment out the cancelioex call and check if the problem still exist? I'm actually not sure we need the cancellation, however we will need a small state machine for dispose and actually free the memory correctly.

somdoron avatar Nov 04 '18 14:11 somdoron

Thank you for replying.

At this moment, I think immersing the PinnedBuffer into Overlapped and disposing the buffer along with the Overlapped may be the way to go, since the buffer is used by the Overlapped only and it can not be unpinned before the overlapped operation is finished.

wmjordan avatar Nov 05 '18 00:11 wmjordan

Can you make a PR?

somdoron avatar Nov 05 '18 07:11 somdoron

Sure.

wmjordan avatar Nov 05 '18 08:11 wmjordan

I'd previously posted a PR with some minor changes. Is it appropriate to merge it?

wmjordan avatar Nov 05 '18 08:11 wmjordan

The leakage problem is not yet fully solved. Sometimes Overlap instances can not be freed. I studied the source code closer and found that the sequence of statements in Dispose of AsyncSocket were so critical.

Currently there were four critical statements in the above method:

m_inOverlapped.Dispose();
m_outOverlapped.Dispose();
UnsafeMethods.CancelIoEx(Handle, IntPtr.Zero);
UnsafeMethods.closesocket(Handle);

When the Overlapped instances are disposed, it is very probable that overlap operations such as receiving or sending are outstanding. Thus the Free method inside Overlapped will not be called. When CancelIoEx is called, the overlapped operations will be canceled and the Overlapped.CompleteOperation MAY be executed, if the cancellation is in time. When closesocket is called, according to MSDN:

The closesocket function will initiate cancellation on the outstanding I/O operations, but that does not mean that an application will receive I/O completion for these I/O operations by the time the closesocket function returns. Thus, an application should not cleanup any resources (WSAOVERLAPPED structures, for example) referenced by the outstanding I/O requests until the I/O requests are indeed completed.

Since we rely on the completion of Overlapped.CompleteOperation to cleanup the resource consumed by Overlapped, but if the overlapped operation is canceled by closesocket, the CompleteOperation might NOT be called and thus the resource can't be reclaimed.

wmjordan avatar Nov 06 '18 09:11 wmjordan

If I placed a break point onto the CancelIoEx, and stepped over it, the CompleteOperation was called. But when no break point was there, and the completion port was still alive, sometimes the CompleteOperation was not called. Some overlapped operations ended without any notification.

wmjordan avatar Nov 06 '18 09:11 wmjordan

We currently not checking the return value and last error of cancelioex, maybe this is the direction?

On Tue, Nov 6, 2018, 11:16 wmjordan <[email protected] wrote:

If I placed a break point onto the CancelIoEx, and stepped over it, the CompleteOperation was called. But when no break point was there, and the completion port was still alive, sometimes the CompleteOperation was not called. Some overlapped operations ended without any notification.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/somdoron/AsyncIO/issues/26#issuecomment-436182885, or mute the thread https://github.com/notifications/unsubscribe-auth/AClv9udYLVUYhIVI93g4SSp_llK5gYxyks5usVNugaJpZM4YMy2L .

somdoron avatar Nov 06 '18 09:11 somdoron

I added code to check the error code of CancelIoEx. It could return true but the CompleteOperation was still not called later. I also tried to move the Dispose of Overlapped instances from before CancelIoEx to behind. It seemed that the CompleteOperation was missed less, but sometimes it just missed.

I will try more and return to you later.

wmjordan avatar Nov 06 '18 09:11 wmjordan

I could not yet find the reason why CompleteOperation could not be fired. I eventually created a class which ran thread and threw Overlapped instances which was not handled by CompleteOperation to that thread to destroy those instances after some time.

I recently read the documentation of CancelIoEx. It said that A Winsock client must never issue closesocket on s concurrently with another Winsock function call. Maybe it was potentially leading to another problem that I disposed an AsyncSocket when it was sending or connecting.

wmjordan avatar Jan 08 '19 12:01 wmjordan

Today I spent a whole day diagnosing this issue. I found a weird thing, which could even happen in the AsyncIO.Tests project. I ran the SendReceive test method.

And placed a breakpoint at the call to CancelIoEx in Socket.cs.

I found that when listener, serverSocket and clientSocket were disposing, the return values of CancelIoEx were all false, and I ran subsequent Marshal.GetLastWin32Error() via the immediate window of the debugger, after each call to CancelIoEx, the return values were all 1168, which indicated the corresponding asynchronous IO operations were not found.

Is it the behavior of CancelIoEx or are we missing something?

wmjordan avatar Apr 28 '20 12:04 wmjordan