RobustToolbox icon indicating copy to clipboard operation
RobustToolbox copied to clipboard

PVS & client state handling changes

Open ElectroJr opened this issue 2 years ago • 4 comments

Requires space-wizards/space-station-14/pull/9273

Fixes #2553 Fixes #2872

Summary

The point of this PR is to fix the stuttering caused by packet loss. while also making some general changes to client game state manager for my own sanity (e.g., see #2876). Instead of LastProcessedRealState, CurServerTick, _lastProcessedTick, and LastRealTick There is now just LastRealTick and LastProcessedTick. A good chunk of this PR was me trying to make sense of the client game state manager, and I might've misunderstood some of it or made simple mistakes, so someone who is familiar with it should definitely have a look.

AFAICT the end result is that the game performs better for clients when it comes to loading and un-loading PVS chunks. Once a map has been seen (i.e., no more entity spawning) I can walk around without stuttering in debug, though it still dips to ~100fps when aghosting. However, the server PVS system performance is probably going to decrease, partly due just due to lastAck no longer always being curTick-1, and partly due to having to track acked entities. But on the other hand, it should generally send less data as players move around in previously explored & un-modified areas.

What caused the stuttering

See #2872.

Basically, due to an error in MsgState, all game state messages were being sent reliably which always overwrites a client's last-acked tick. The result was that as far as the server was concerned it only ever had to send a single tick of data. If one of those packets gets dropped, this then leaves the client with a hole in state buffer, causing it to basically freeze until a that state gets re-sent.

Why just fixing MsgState is insufficient

Fixing the MsgState issue then reveals existing issues where the PVS system cannot properly account for the client last-acked tick. This was partially fixed by #2733, but there are still some issues, including:

  • The server uses Remove(uid) on the last-acked state dictionary, despite that dictionary being re-used if no ack is received.
  • The limited overflow dictionary size, combined with a client's limited PVS budget could result in a situation where the last-acked visible set is always null, causing every entity to be considered as having just entered the client's view.
  • If you make minimal changes, this would result in PVS system spending TickBuffer ticks respecting the client's PVS budget, and then just dump entities on the client once it overflowed.

The general fix for these two problems to just properly track the last-acked entity set & have some basic way of dealing with laggy clients that exceed the dictionary size.

Bandwidth & client-stuttering

On its own the above fix significantly increases both the bandwidth and client-side PVS related lag.

  • aghost-ing around saltern after all entities have loaded used to send ~30-50 KiB/s
    • With the default lag options ((~150ms @ 60tps)), this would then have increased to ~300-500KiB/s
    • Obviously real players are slower than aghosts, and don't move around that much, but its still a lot.
  • The client blindly applies the whole received entity state every tick, regardless of whether that state contains old data that has already been applied. So instead of applying 50 new entity states per tick, it would apply ~500.

The client stuttering was largely removed by just sending information about the last modified tick along with the entity states, and having the client state handling only applying states with new information.

The bandwidth was reduced by making changes to how leaving-pvs / detatch-to-null works. Previously, PVS would just send a fake transform component state that was otherwise indistinguishable from an actual state for an entity in null-space. These have been separated out and are now just sent reliably & separately as a simple list of detached entities (MsgStateLeavePvs). When entities re-enter PVS, the server no longer dumps the whole entity state. It instead only sends any changes since the entity was last seen by clients. This requires tracking the last time that an entity was seen by clients, which might hurt performance, but on the other hand it also means the server has to spend less time fetching unnecessary component states. Apparently this was brought up before and some people were opposed to it, I imagine its better to cache this than to dump whole entity states every time. Alternatively it could probably be turned into a cvar relatively easily.

When aghosting around, the bandwidth now is more or less the same as it was before, though it is still larger when initially encountering new entities. Though that could still be reduced by lowering MsgState.ReliableThreshold.

Misc client state handling changes

  • Rewrites a good chunk of it to (IMO) make it clearer what's going on.
    • Less tick time variables
    • Less functions with side effects that sometimes do, and sometimes don't make changes to tick-timing variables.
    • Fix what, AFAICT, were bugs where last processed tick should have been last real tick
  • If the client has to run multiple ticks in a single tick (???) it no longer runs prediction ticks for anything but the last tick
    • Previously you could have situations where it would reset > apply > predict > rest > apply > predict > reset > apply > predict, then finally render, now it should just reset > apply x3 > predict
    • Makes some stutters less severe
    • Its still pretty jank though. Somewhat related to #2552?
  • Cleans up the state-application code a bit, and adds separate profiler sub-groups.

ElectroJr avatar Jun 29 '22 05:06 ElectroJr

Does this mean we can default network smoothing to 2?

metalgearsloth avatar Jun 29 '22 14:06 metalgearsloth

i did that in a content pr already

mirrorcult avatar Jun 29 '22 14:06 mirrorcult

NFI what I'm looking at with gamestatemanager.

Also sometimes when shutting down I'd get some MsgState exception trying to play the gridshake sound but it wasn't consistent and I'm also not sure if it's on master so idk.

metalgearsloth avatar Jun 30 '22 16:06 metalgearsloth

The PR currently runs into some deletion issues relating to deleting grid entities after they have been sent to null-space). These are the same problems that #3008 needs to deal with, so some form of that PR needs to get merged first.

Also sometimes when shutting down I'd get some MsgState exception trying to play the gridshake sound but it wasn't consistent and I'm also not sure if it's on master so idk.

That's a separate issue and I can reproduce that relatively easily (see #3011).

Does this mean we can default network smoothing to 2?

This PR now changes the minimum buffer size (AFAICT it was just wrong before?), so a value of 2 is now functionally like a value of 1 would have been before this PR. Though AFAIK the buffers weren't actually helping at all with packet loss, and just keeping the value at 2 is probably fine.

ElectroJr avatar Jul 01 '22 07:07 ElectroJr