Plasma Asio Changes

Pull request to cut networking over from NT sockets to ASIO. This change affects all operating systems including Windows.

This pull request is necessary to move forward on native macOS and Linux versions, but should be tested and considered carefully. It's working reliably on macOS - but the NT sockets version had a lot of handling for edge cases and timeouts. There are open questions on if the NT sockets versions could reach all the defined code paths. It's not clear that the Windows version could do soft disconnects which a lot of this code is for.

Once approved and merged, next steps will be pull requests to start pulling in the native macOS client. Thats not going to be a quick process though. I need to do more review on the Mac code to make sure it meets code standards, and has any clear rough edges figured out. This will allow the Metal branch to pull in ASIO from master and fully build.

Jul 01 '22 04:07 colincornaby

Just checking in -- is this ready for a review?

Jul 19 '22 00:07 Hoikas

@Hoikas - Yes, two notes:

It's as ready for review as it's probably going to be from my end. It needs greater testing on Windows, and possibly wider feedback.
That said - this code has been working extremely reliably on Mac, and I can participate in shards on the Mac and patch no issue. I was actually about to ask about testing on Gehn, but maybe everyone here should take a look first.

Jul 19 '22 01:07 colincornaby

Gave this PR a try with two players on the same server. Everything went well except for this asserts which would get triggered sometimes on link out:

ASSERTION FAILED: File: /Users/colincornaby/Documents/Plasma/PlasmaSrc/Sources/Plasma/PubUtilLib/plNetClient/plNetClientMgr.cpp > Line: 391 Message: Still remote players left when linking out

I'm not sure if this is a known issue or unique to this socket. And I'm don't have a clear enough idea of the networking to quite know where to look.

Jul 19 '22 05:07 colincornaby

This has been in use on the TL client for several weeks to no adverse effect, FWIW. Full review still pending.

Aug 19 '22 00:08 Hoikas

I've seen some weirdness on Linux when I tried to run my plFilePatcher PR with this, aborting on exit. Entirely possible I'm doing something wrong, but probably worth investigating.

Aug 19 '22 00:08 dpogue

I can investigate the combination but you might be the best person to take a first swing at it @dpogue. I'm juggling a few things and it would make me feel better to have someone crawling through this code besides me since it's so critical.

Aug 19 '22 00:08 colincornaby

I'll try to investigate, but it might be into September before I get a chance to really dig into it

Aug 19 '22 00:08 dpogue

Totally understood. If you could write up repo steps in since I haven't used that tool yet - I can probably look in the next few weeks. I don't have all the environments available to me (and my Windows setup is still kind of a mess) - so it might take me some time to get to the bug if it's not happening on macOS.

Aug 19 '22 00:08 colincornaby

Hmmm... Looks like we have a conflict, and (for some reason) AppVeyor has returned???

Nov 14 '22 00:11 Hoikas

Hmmm... Looks like we have a conflict, and (for some reason) AppVeyor has returned???

I'll start with rebase and see if I can get anywhere.

Nov 14 '22 04:11 colincornaby

The rebase seems to have solved the conflict (there was a bug fix in the NT sockets) and AppVeyor seems to be gone again.

Nov 14 '22 04:11 colincornaby

I've reverted the change that added the subclass of std::thread that tried to patch in a timed join via a condition. There were a lot of issues trying to patch things over the std::thread constructor. Another developer might have more luck than me - but I haven't seen any other implementations adding the functionality to std::thread via a subclass yet.

To review the issue: std::thread does not like the underlying thread state being changed, and will throw/log errors if the underlying thread state does not match its own bookkeeping. This can cause problems with the platform specific code that's trying to implement a timed thread join.

On Windows there isn't a problem because the Windows code is doing a timed observation of thread completion, it's not actually trying to join the thread. So no underlying thread state is being changed.
The Linux code is a problem because it's actually attempting a join, which will change the thread state.

A few solutions have been proposed.

Conditions

I've been looking at putting a condition into the thread. Conditions can be waited on with a timeout. When the thread is done it can signal the condition, and signal any observer that is waiting on thread completion. This is the approach Boost is using.
Trying to implement this directly into std::thread is troublesome. I have to make sure both the thread's body and the thread object have access to the same condition, and the thread body is created in std::thread's constructor. The condition also has to survive object moves.
Writing our own thread wrapper around std::thread seems more plausible, but requires deeper surgery in the code.

Futures

@dpogue has proposed a change that spawns a second thread using a promise, and that promise joins the thread. The promise's state can be observed with a timeout.
This behavior isn't quite the same as a timed join. If a timed join times out, the join is cancelled. If waiting on the promise times out, the promise will continue executing in the background, and a join will eventually happen.
There's other side effects as well - the behavior being implemented is that the thread will be detached after a timeout. If the promise ever does execute its join, this will cause an exception. It's illegal to join a thread after it's been detached. We could technically catch this exception.

Do we need to worry about this?

I'm a little worried about this class of bug in general. The threads are being force detached so they can be killed mid execution. This could be very bad and we should possibly be weary of this. I'm not sure of the entire range of things these threads do - but they are attached to different types of I/O. We may want to be careful about just killing I/O midway through execution.

Because this code also touches critical functions, the possibility of a deadlock should be concerning on its own. We may not want to silently kill deadlocks in code that should not be deadlocking. Hiding the deadlocks could be hiding more serious issues and underlying I/O failures.

I'm not an expert though at everything these threads are doing. Someone else on the thread might have a better idea of how much danger is possibly lurking here. And maybe there are known deadlocks here that I'm not familiar with.

Personally - I'm still in favor of shipping with a non-timed-out join on *nix, for the reasons above. I'm not eager to cover over deadlock issues with this fallback, or cut off slow or deadlocked threads that may be doing real I/O. But, I've mostly been focusing on the render portions of this engine so there may be things I'm not seeing or understanding here. We also could continue to deal with the timed join on *nix in follow up PRs.

Dec 04 '22 08:12 colincornaby

I assume the whole timed join stuff was originally put in to prevent the game from hanging indefinitely on shutdown when a deadlock does occur... What if, instead of trying to solve the problem of timed joins on individual threads, we just use a normal join everywhere, but spawn a single thread during shutdown that waits for a while and then std::terminates the whole process (or debug break in debug builds) if it doesn't get cancelled before its timeout expires? That way, we still prevent the possibility of indefinite hangs at shutdown, but don't have to deal with individual threads being left in a bad state, or with trying to make timed joins work across different platforms.

Dec 04 '22 20:12 zrax

Would std::terminate solve the problem of threads dying mid work? I think it would still cut off threads mid execution (especially if the intention is to kill deadlocked threads.)

I'm doing some research into the source history, but I'm also still concerned about covering up deadlocked threads in general. Deadlocking threads would indicate some serious problem in the engine and it feels a little weird to plan for them. Obviously the client deadlocking on exit is not a great experience either. But if threads are having deadlocking issues, the deadlock could have happened halfway through the users play session. Timed joins can be useful, just worried about the implications here. Threads are complicated and there is potential for deadlock that wasn't understood when the client shipped. But thread deadlocks seem like something to fix directly to me because of the implications of a network thread deadlocking.

Another option is to go back to (I know I know) something like 2076f8a6933ecc9e58da2681cc0be7bd695b9880 (changed in abcbba7a45d0a20efecdb6b8fd0b2ed25eb42065). The structure there would be usable for adding auxiliary things like a condition member necessary for cross platform timed joins.

But again - still worried about covering up deadlocks in general. Maybe we could log a warning message in debug builds so it at least gets some attention during development. Still worried about implications of deadlocking threads in a released build though.

Dec 04 '22 23:12 colincornaby

This is a pretty good reference on Stack Overflow - and it's noted that if main exits all threads will be instakilled mid execution. So if there is a deadlock - I don't think there is a safe way to exit. (Although a user killing the process may just end up being the same amount of unsafe.)

https://stackoverflow.com/questions/22803600/when-should-i-use-stdthreaddetach

This answer is also pretty good on the utility of detach:

This answer is aimed at answering question in the title, rather than explaining the difference between join and detach. So when should std::thread::detach be used?

In properly maintained C++ code std::thread::detach should not be used at all. Programmer must ensure that all the created threads gracefully exit releasing all the acquired resources and performing other necessary cleanup actions. This implies that giving up ownership of threads by invoking detach is not an option and therefore join should be used in all scenarios.

However some applications rely on old and often not well designed and supported APIs that may contain indefinitely blocking functions. Moving invocations of these functions into a dedicated thread to avoid blocking other stuff is a common practice. There is no way to make such a thread to exit gracefully so use of join will just lead to primary thread blocking. That's a situation when using detach would be a less evil alternative to, say, allocating thread object with dynamic storage duration and then purposely leaking it.

I have noticed Plasma is using things like global mutexes that when not properly cleaned up will cause the next session to fail to launch. I'm not sure if that's happening in any of these paths, but there is certainly state that can be a problem when not cleaned up.

Again - if the user does the force quit themselves though it may end up being the same thing. So I'm not sure if we should be putting ourselves in a bad state or if we should wait for the user to do it.

Dec 05 '22 00:12 colincornaby

On macOS I'm noticing the ASIO networking in max'ing out at about 800 kb/s. I thought maybe it was a bandwidth restriction of Trollland, but I'm not seeing the same issues with the NT sockets on Windows. I'll dig a bit more.

Dec 10 '22 19:12 colincornaby

Issue seems to be Mac (and maybe Linux) specific - Windows version is getting the expected download speeds with ASIO.

Dec 10 '22 20:12 colincornaby

I can get macOS performance to improve to match Windows if I increase the size of the network receive buffer. But Linux is showing the same issues and the buffer size change doesn't work there.

Dec 10 '22 23:12 colincornaby

Some of this conversation happened in IRC - but repeating here.

I've removed the manual buffer size configuration - which has resolved the issues on macOS and Linux.

Furthermore - Microsoft recommends not setting these options. The network stack can more intelligently configure these buffers: https://techcommunity.microsoft.com/t5/networking-blog/windows-network-performance-suffering-from-bad-buffering/ba-p/339668

The reasoning makes sense. When the buffer fills, there is an additional interaction between the client and the server as TCP/IP manages the overflow. The latency on my fiber connection is 5-10 ms, so at a buffer size of 64k, I'd incur 5-10 ms of latency each 64k. This would constrain any download speeds to approx 800 k/sec, which matches what I'm seeing.

There is still the mystery of why Windows is unaffected. Microsoft's documentation says you can suggest a buffer size, but they may quietly not honor the requested buffer size. That could be whats happening here - especially if MS feels there is a problem with many applications abusing this configuration option.

One open question is how far back Windows intelligently manages buffer sizes. The MS tech note says Windows 10 - but it was written during the Windows 10 era. I've seen some hints that Windows 10's modern network stack was introduced in Vista. But that doesn't mean XP's network stack didn't make the same automatic adjustments to the network buffer. That said - I have not tested this change on XP.

Dec 12 '22 04:12 colincornaby

Going through and looking more at the timeout stuff. I tried to unwind timeouts but then realized how deeply they're implemented.

DNS resolution, for example, uses timeouts in its destructors. This may have made sense at one time... you wouldn't want a destructor held up by a stalled network request. But the network stack has built in cancellation support that can cause an immediate return from network functions. And those cancellation functions are being invoked.

I'm also looking at ThreadDestroy in pnAcThread.cpp (https://github.com/colincornaby/Plasma/blob/5d535f580a897ea0d74631aacbff9456d3acd1ca/Sources/Plasma/NucleusLib/pnAsyncCoreExe/pnAceThread.cpp#L62). I think this is a function that is meant to ensure all threads are destroyed. But even though it's called destroy, it doesn't actually destroy anything, it just waits.

The one new issue I see around ThreadDestroy is it relies on a PerfCounter to decide when all threads have been destroyed. When AceThread used to be a wrapper around std::thread, that wrapper would automatically increment and decrement the counter. Now that std::thread is being used without a wrapper, each thread is responsible to incrementing and decrementing its own PerfCounters. This creates room for a lot of possible bugs. If a thread forgets to increment/decrement a counter or enters into some unexpected control flow that causes one of those things not to happen - the TheadDestroy function will unnecessarily timeout.

Nothing here is outright broken. But it is kind of code-smelly. Again, most of this was inherited from the original source.

I think I'm leaning in favor again of re-wrapping std::thread with the AceThread wrapper, and enforcing things like the increment/decrement of the PerfCounter. Re-wrapping std::thread would also give more space for properly implementing cross platform async join (even if I'm skeptical of async-join-or-die in general.)

Dec 19 '22 04:12 colincornaby

I've tested this on Linux with plFilePatcher, and it was all good. Previous issues around threads aborting on exit are gone.

Dec 22 '22 06:12 dpogue

I'm going to bake the wrapper down into the other parts of the code that use async join. I still think the wrapper itself can use some review. It exists in this awkward place in between C and C++. But if it unblocks Linux and keeps moving things forward that's good. It is a code cleanliness regression, using std::thread would be cleaner. But I don't think a bare std::thread is workable here. There are some changes coming in future versions of C++ threads that may obsolete this wrapper. But that would be in C++20 and beyond.

Dec 23 '22 03:12 colincornaby

I mentioned on IRC I was having a weird issue - it was actually a known issue. plFilePatcher deadlocks when it's run against a working directory with no server.ini. There are underlying issues in the engine if the servers aren't configured. It's an edge case because the client will just give up if the server.ini isn't detected. But plFilePatcher doesn't have similar catches.

I did figure out the deadlock and fixed it - but that's a bug in the core networking code and not the ASIO code. I'll leave that for a different pull requests. But just letting everyone know the issue I was working on is not related to ASIO. I just forgot to set my working directory...

Dec 25 '22 03:12 colincornaby

There's some warnings about unsigned int and size_t mismatches. I'm going to standardize on size_t where I can in since that's best practice for buffer sizes.

Dec 29 '22 07:12 colincornaby

I don't think I understand why AsyncThreadRef exists. Why can't we pass around references to the AsyncThread itself instead?

Dec 30 '22 04:12 Hoikas

I don't think I understand why AsyncThreadRef exists. Why can't we pass around references to the AsyncThread itself instead?

We can. That would be a similar change to what was done for passing references to the other structure. I can make that change.

Dec 30 '22 04:12 colincornaby

Ok, coming back around to AsyncThreadRef again.

One of the issues with AsyncThread is it has to exist in two threads at once. There's the main thread that's going to add it to an array to be tracked, and then the thread needs its own access to get to the mutex. Both threads need to own the AsyncThread simultaneously. If it goes out of scope in one thread, it still needs to be alive in the other. This is fairly incompatible with reference semantics.

shared_ptr does solve this problem rather well. The AsyncThreadRef covers that implementation detail. But it could be removed and all the code could be tweaked to work with shared_ptr directly.

Dec 30 '22 06:12 colincornaby

Thanks for the summary!

Dec 31 '22 16:12 Hoikas

I think we are getting very close to extensive testing and merge taim. It would be great if you could take care of the code style issues in the interim 😄

I'll give the new code a run-through with Clang Format and see if that catches the format issues.

Dec 31 '22 20:12 colincornaby

It seems like the testing tasks we've wanted to do so far have been completed. Just checking in to see if there is more testing people would like to do.

Jan 31 '23 18:01 colincornaby

Plasma Plasma copied to clipboard

Asio Changes

Conditions

Futures

Do we need to worry about this?

Plasma
Plasma copied to clipboard