DarkRift icon indicating copy to clipboard operation
DarkRift copied to clipboard

MessageBuffers not disposed when ServerNetworkConnection CanSend==false

Open RobbieLePommie opened this issue 2 years ago • 8 comments

Quick Description

Received messages that MessageBuffers and AutoRecyclingArrays were not being recycled correctly. Tracked this down to "CanSend" being false, and thus rejecting messages.

Explanation

Symptom: warning message that MessageBuffers and AutoRecyclingArrays were not being recycled correctly.

Traced this to down to the following code lines

public override bool SendMessage[Un]Reliable(MessageBuffer message)
{
    if (!CanSend)
        return false;

Triggerable in a condition where messages are set to be sent to a client, but the client disconnects simultaneously. While additional checks for this case can be made before sending the message, there may be other conditions; it therefore good practice to ensure the buffer is disposed.

If appropriate, have created fix am using locally, and can turn into pull request: https://github.com/DarkRiftNetworking/DarkRift/compare/master...RobbieLePommie:DarkRift:RobbieLePommie-DisposeMessageBuffersOnFail

RobbieLePommie avatar Aug 23 '22 01:08 RobbieLePommie

Hi,

inside DarkRift.Client we also have a MessageBuffer not getting disposed.

The code: DarkRiftClient.cs

 public bool SendMessage(Message message, SendMode sendMode)
        {
            if (message.IsPingMessage)
                RoundTripTime.RecordOutboundPing(message.PingCode);

            return Connection.SendMessage(message.ToBuffer(), sendMode);
        }

the message.ToBuffer() creates a new MessageBuffer and copy all the bytes.

and then in BichannelClientConnection.cs we have

public override bool SendMessageReliable(MessageBuffer message)
        {
            if (connectionState == ConnectionState.Disconnected)
                return false;

fix would be

SendMessage (DarkRiftClient.cs)

public bool SendMessage(Message message, SendMode sendMode)
        {
            if (message.IsPingMessage)
                RoundTripTime.RecordOutboundPing(message.PingCode);

            var msgBuffer = message.ToBuffer();

            if(Connection.SendMessage(msgBuffer, sendMode) == false)
            {
                msgBuffer.Dispose();
                return false;
            }

            return true;
        }

and for DarkRift.Server I suggest the following fix: Client.cs

private bool PushBuffer(MessageBuffer buffer, SendMode sendMode)
        {
            if (!connection.SendMessage(buffer, sendMode))
            {
                buffer.Dispose();
                return false;
            }

            Interlocked.Increment(ref messagesPushed);

            return true;
        }

xAL95 avatar Aug 23 '22 07:08 xAL95

That all follows, thanks. Probably would be more logical/consistent to make them all "dispose" at the point of creation (when not successful) using the var msgBuffer = message.ToBuffer(); if([Send/Push(msgBuffer)] == false) { msgBuffer.Dispose(); } sequence.

I'll edit the commit above to include those when I can test.

RobbieLePommie avatar Aug 23 '22 23:08 RobbieLePommie

Oh very good finds! I knew there was a couple of instances where the MessageBuffers weren't getting recycled but I always thought it was in the receive code!

Personally, I actually think the disposing code should be in BichannelServerConnection/BichannelClientConnection. While a little counter intuative to recycling it near the code that creates the message, in the happy case where the message is sent fine the message buffer is recycled in the BichannelXConnection so recycling message buffers there in the unhappy case as well keeps things consistent and keeps a consistent 'pass the buffer and forget about it' semantic for callers.

If either of you want to go ahead with a PR I'll gladly take a look!

JamJar00 avatar Aug 25 '22 21:08 JamJar00

Pull request created;

I left those I fixed (top post) as they were; (Commit 1) I added xAL95's in the connection classes as requested, however one of these had to go deeper into the UDP class (where exception was thrown). (Commit 2)

RobbieLePommie avatar Aug 30 '22 07:08 RobbieLePommie

You need to hold off applying this pull request; there are some more knock on effects I'm working through, all realted to client disconnects.

As the buffer is attached to the sending/receiving "args", then we end up with thousands of [Warning] ObjectCacheMonitor 2394 AutoRecyclingArray objects were finalized last period. This is usually a sign that you are not recycling objects correctly. messages. The obvious solution is to also ObjectCache.ReturnSocketAsyncEventArgs(args); alongside the message dispose where needed.

While this appears to fix the initial issue, there are now issues that appear to be buffers being overwritten - my hunch is that a message is in flight in one thread, while a disconnect has killed the message, returned to ObjectPool and been taken over by a new message. Appears to happen on both send and receive.

Example exception:

   at DarkRift.MessageBuffer.EnsureLength(Int32 newLength)
   at DarkRift.DarkRiftWriter.Write(Boolean value)
   at TicTacToe.Shared.Models.GameTile.Serialize(SerializeEvent e)
   at DarkRift.DarkRiftWriter.Write[T](T[] value)
   at TicTacToe.Shared.Models.GameBoard.Serialize(SerializeEvent e)
   at DarkRift.DarkRiftWriter.Write[T](T serializable)
   at TicTacToe.Shared.Models.GameSetup.Serialize(SerializeEvent e)
   at DarkRift.DarkRiftWriter.Write[T](T serializable)
   at TicTacToe.Shared.Packets.FromServer.Game.GetSetupReply.Serialize(SerializeEvent e)
   at TicTacToe.Server.Common.MessageSender.SendMessage(IClient client, Packet packet, SendMode sendMode)

(Where "Send Message" is a straight using DarkRiftWriter writer = DarkRiftWriter.Create(System.Text.Encoding.UTF8); writer.Write(packet); while the packet is a chain of serialisations, those ms is enough time for the buffer to disppear?)

Still chasing these down, but if anyone has any solutions, would be grand. As mentioned, issues only appear after client(s) disconnect.

RobbieLePommie avatar Aug 31 '22 07:08 RobbieLePommie

Updated with new pull requests; tested with 10,000 connected clients, disconnecting / reconnecting a thousand or so in chunks of 200 at a time. (Note: for this test, also increased the default ObjectPool sizes, but that should not make any difference).

RobbieLePommie avatar Sep 01 '22 05:09 RobbieLePommie

Apologies for the slow follow up.

@RobbieLePommie is there a way we can put your test into the DarkRift.SystemTests project? Does it really require 10000 concurrent clients to assert itself?

4real avatar Oct 12 '22 13:10 4real

See note in other issue, as this is also related the other ObjectCache being "stolen" by other client.

However to test the first condition

  • massage "CanSend" to be false
  • sending a message (reliable or unreliable)
  • check the buffers returned to the pool

RobbieLePommie avatar Oct 12 '22 21:10 RobbieLePommie

Merged https://github.com/DarkRiftNetworking/DarkRift/pull/148, let me know if not sufficient.

4real avatar Dec 20 '22 14:12 4real