DarkRift icon indicating copy to clipboard operation
DarkRift copied to clipboard

UDP Messages not received on Server if sent in specific order with players joining

Open CaptainPineapple opened this issue 2 years ago • 12 comments

Hello everyone,

Quick Description

I currently face the same issue as someone else already had in this closed issue here (issue nr 117). Core problem is that the server in certain situations does not receive unreliable messages anymore.

Explanation

Same as in issue 117 i am running my embedded unity server headless in a linux based docker container. When i run all my messages on reliable, the logic itself works perfectly. The idea however is to send position data for my players as unreliable messages. When i run a server where the players start sending unreliable data before another player joins everything works fine.

Problems start when 3 players join and do not send unreliable packages before other players have joined. In this case the messages will not reach the server for some reason.

So this works:
  • P1 joins
  • P1 sends unreliable message
  • P2 joins
  • P2 sends unreliable message
  • P3 joins
  • P3 sends unreliable message -> in this case all messages arrive on the server and can be acknowledged. All following traffic also works fine. Also if this continues with more players.
This does not work:
  • P1 joins
  • P2 joins
  • P3 joins
  • P1 sends unreliable message
  • P2 sends unreliable message
  • P3 sends unreliable message -> in this case none of the messages arrive on the server

For Issues

  • Unity Embedded in a linux docker container (image: ubuntu:18.04)
  • Unity Version 2021.1.7f
  • DR Version: 2.10.1 (latest as of now, think this also already occured on an earier version though)
  • as in issue 117 linked above i use Playfab but i don't think that this is relevant here as the versions in which i can reproduce both behaviours both contain the same playfab behaviour while no playfab related methods are being called until this point.

I will try to create a minimal setup where this can reliably be reproduced wqhen i find the time.

Thanks in advance for any comments, input, ideas on this. If there is information missing that might be helpful please let me know and i will happily provide it.

CaptainPineapple avatar Aug 15 '22 17:08 CaptainPineapple

Im not able to do a pull request. But I've fixed it on my fork.

See changes at BichannelListenerBase.cs https://github.com/xAL95/DarkRift/commit/4c288f1e45e04ac019372e6766315f3dd4f1d5c7#diff-a6d48fdd9f19645763e90587b1c2de3b4300c5aae96d4ac9006c68b4c4d95ac6

What caused the issue? Timer = new System.Threading.Timer(ConnectionTimeoutHandler, token, 5000, Timeout.Infinite)

I changed it to

Timer = CreateOneShotTimer(5000, delegate
{
 ConnectionTimeoutHandler(token);
})

using the "Timer" class from DarkRift.Server.

Also the function private void ConnectionTimeoutHandler

has been edited to fix this.

xAL95 avatar Aug 18 '22 17:08 xAL95

I'm tring to work out why this fix works. The only thing I can think of is that in the latter the delegate is capturing the value of the token differently to just passing in the state of the variable. In that case, the following code should also work (I wonder if you could test it since you know how to reporduce)

 Timer = new System.Threading.Timer(delegate
{
 ConnectionTimeoutHandler(token);
}, null, 5000, Timeout.Infinite)

The only other thing it could be is that the latter, if the Dispatcher is enabled, would be running the callback on the main thread instead, but I don't know why that would help!

If you'd like credit for a fix I'm happy to help you with creating a pull request etc. if you're not confident

JamJar00 avatar Aug 19 '22 11:08 JamJar00

Hi, the issue is that there seems to be a race. If you do

System.Threading.Thread.Sleep(5000);

above the BichannelClientConnection.cs udpSocket.Send(buffer);

Then sometimes you'll see

Untitled

As you see in the console print, one client gets connected and connectiontimeouthandler is called. Trying to remove the token but it does not exist.

With the function CreateOneShotTimer and

private void ConnectionTimeoutHandler(object token)
        {
            long t = (long)token;
            Console.WriteLine("ConnectionTimeoutHandler: " + t);
            lock (PendingTcpSockets)
            {
                if (PendingTcpSockets.ContainsKey(t) == false)
                {
                    Console.WriteLine("token does not exist");
                    return;
                }
            }

            EndPoint remoteEndPoint = CancelPendingTcpConnection(t);

            //Check found (should always be but will crash server otherwise)
            if (remoteEndPoint != null)
                Logger.Trace("Connection attempt from " + remoteEndPoint + " timed out.");
            

            connectionAttemptTimeoutsCounter.Increment();
        }

I get the message "token does not exist". There might still be a race but seems to be a lot less likely to happen Untitled

Oh and for testing I made 3 DarkRiftClient ConnectInBackground

xAL95 avatar Aug 19 '22 13:08 xAL95

So the solution right now which I have is:

use timer: Timer = new System.Threading.Timer(ConnectionTimeoutHandler, token, 5000, Timeout.Infinite)

private void ConnectionTimeoutHandler(object token)
        {
            EndPoint remoteEndPoint = CancelPendingTcpConnection((long)token, out bool tokenFound);
            if(tokenFound == false)
            {
                return;
            }

            //Check found (should always be but will crash server otherwise)
            if (remoteEndPoint != null)
                Logger.Trace("Connection attempt from " + remoteEndPoint + " timed out.");
            

            connectionAttemptTimeoutsCounter.Increment();
        }
private EndPoint CancelPendingTcpConnection(long token, out bool tokenFound)
        {
            lock (PendingTcpSockets)
            {
                if (PendingTcpSockets.ContainsKey(token) == false)
                {
                    tokenFound = false;
                    return null;
                }

                tokenFound = true;

                PendingConnection connection = PendingTcpSockets[token];
                connection.Timer.Dispose();

                EndPoint endPoint = null;
                try
                {
                    endPoint = connection.TcpSocket.RemoteEndPoint;

                    connection.TcpSocket.Close();
                }
                catch (SocketException e)
                {
                    if (endPoint != null)
                        Logger.Trace("A SocketException occurred whilst cancelling the connection to " + endPoint + ". It is likely the client disconnected before the server was able to perform the operation.", e);
                    else
                        Logger.Trace("A SocketException occurred whilst cancelling a connection. It is likely the client disconnected before the server was able to perform the operation.", e);
                }

                PendingTcpSockets.Remove(token);

                return endPoint;
            }
        }

xAL95 avatar Aug 19 '22 14:08 xAL95

Oh man guys, sounds amazing that there is a possible fix. As someone who has not delved into the DarkRift code it's kinda hard to believe that this fixes the issue as i fail to understand the connection to the symptom it causes but i guess i'll gladly and blindly accept this either way :D

If i can help with something please let me know. So far i'll try to pull and build your version to see if it fixes this issue for me as well. Thank you so much already!

Edit: Cannot get this to work. The built version of this fork is somehow not compatible with the UnityClient from the main Repo as this has apparently been removed from the forked version for some reason. Will wait for the Update on the main branch - as i already mentioned: please let me know if i can be of any help for this.

CaptainPineapple avatar Aug 20 '22 09:08 CaptainPineapple

You can try this fork from me https://github.com/xAL95/DarkRift Im using it with unity.

xAL95 avatar Aug 22 '22 12:08 xAL95

Hm i'd like to but i am using the UnityClient from the main Repo and that is not compatible with your current version for some reason/is not contained in your repo. (Or I'm too stupid to properly use both together - a possibility which cannot be dismissed :) )

CaptainPineapple avatar Aug 22 '22 15:08 CaptainPineapple

Hm i'd like to but i am using the UnityClient from the main Repo and that is not compatible with your current version for some reason/is not contained in your repo. (Or I'm too stupid to properly use both together - a possibility which cannot be dismissed :) )

Check the latest push.

xAL95 avatar Aug 22 '22 16:08 xAL95

well i did... i tried a lot to make this work with my project but can't get it to run. I tried removing the UnityClient from my project completly but that broke even more stuff. I managed to solve all issues to make the one from the main repo work with your current fork (i think it might even be broken with the main repo itself? gotta check that at some point..) However one issue i ran into which i am not sure about how to deal with it or if i am searching in the right place: So far my server were running the networklistener type "CompatibilityBichannelListener" which is not available when i build your fork. Do you know why? If i change it to BichannelListener the code runs but i can't establish connection anymore. Sadly i am not deep enough in DR anymore to know what the point of the Compatibility Listener was..

Edit since i forgot to further describe the thing: Basically i now try to connect a client to a server locally and the client always directly disconnects again for some reason. Will try to check if i can reproduce that in a minimal setup as well..

CaptainPineapple avatar Aug 29 '22 18:08 CaptainPineapple

CompatibilityBichannelListener is outdated. You can't mix main repo with my repo because I did some deep changes. Check darkrift logs. Maybe disconnect because of too many strikes?

xAL95 avatar Aug 30 '22 08:08 xAL95

okay, are those "deep changes" documented somewhere? Are those deep changes only the fix for the issue or did you also change the behaviour of DR in other ways?

If the latter is the case i'll have to politly decline testing this further.. I can't rework my project to test this if that new version is then contains unknown behaviour.. Would it be possible to pinpoint all changes needed to fix the bug? Is the only change the timer that you mentioned in your first post?

CaptainPineapple avatar Aug 31 '22 15:08 CaptainPineapple

So i created a MR for this issue containing only the changes that you proposed @xAL95 . I was able to make this clean version with only those changes work in my project and the issue is indeed gone. Thanks a lot for your help! Edit: Can be closed when you see fit. Otherwise i'd do this once the MR is merged. (In case this is wanted)

CaptainPineapple avatar Sep 04 '22 18:09 CaptainPineapple

Merged https://github.com/DarkRiftNetworking/DarkRift/pull/149, reopen if it's not sufficient.

4real avatar Dec 20 '22 14:12 4real