sbox-issues icon indicating copy to clipboard operation
sbox-issues copied to clipboard

Toggling Sync vars can end up stuck in an old state if ACKs are dropped or delayed

Open andy013 opened this issue 3 months ago • 3 comments

Branch

staging

Describe the bug

If you have a Sync var that is toggling between 2 values, it can sometimes end up stuck in an old state for clients. I believe this issue is caused by the code in DeltaSnapshotSystem.SendCluster and DeltaSnapshotSystem.SendSnapshot.

This code only sends Sync var values if the new value is not equal to the last ack'd value:

            foreach (DeltaSnapshot.SnapshotDataEntry entry in snapshot.Entries)
			{
				int slot = entry.Slot;
				byte[] value2 = entry.Value;
				if (!value.TryGetValue(slot, out var value3, Time) || !value2.SequenceEqual(value3))
				{
					snapshotData[slot] = value2;
					value.AddPredicted(slot, value2, Time);
				}
			} 

This works fine if the variable keeps changing, but the problem happens when the last ack'd value is different from the clients current value (if the ack was dropped or delayed) and when the owners current value is the same as the last ack'd value.

I've made a simple example project with 2 Sync vars. One is a Counter that increments by 1 each fixed update. The other is a Flipper that is set to either 0 or 1 depending on if the Counter is even or odd.

Here is what I believe is happening:

  1. Counter = 0, Flipper = 0

  2. Counter = 1, Flipper = 1 (state different, both sent to client)

  3. Counter = 2, Flipper = 0 (state different, both send to client)

  4. Ack for 1 received, ack for 2 delayed or dropped.

  5. Counter = 3, Flipper = 1, owner compares new values to previously ack'd values Counter = 1, Flipper = 1. Counter is different so it is sent, Flipper is equal to last ack'd value, so it is not sent.

  6. Client receives 4 which only changes Counter, resulting in Counter = 3, Flipper = 0 (incorrect state)

In my example project this is corrected on the very next frame, but I believe the value could be stuck for as long as it takes the owner to network a change again. It's just hard to get the ack dropped on demand to create that situation.

To Reproduce

  1. Download and open sync_bug.zip.
  2. Hit play.
  3. Join with a second instance.
  4. On the second instance, enter the console command net_fakelag 100.
  5. Observe logs in console.

Expected behavior

The owner should always network the correct state to the client. (i.e., Flipper is 0 when Counter is even, and 1 when Counter is odd).

Media/Files

No response

Additional context

No response

andy013 avatar Oct 06 '25 14:10 andy013

This is what I don't get:

Counter = 3, Flipper = 1, owner compares new values to previously ack'd values Counter = 1, Flipper = 1. Counter is different so it is sent, Flipper is equal to last ack'd value, so it is not sent.

Here you say it's comparing Flipper (1) against previously ack'd value where Flipper is 1 - so therefore it makes sense that Flipper would not be sent, and why would Flipper be 0 on the client if they've acknowleged that it's 1?

kurozael avatar Oct 07 '25 11:10 kurozael

It's because the client's actual state and the host's "last acknowledged state" for that client have become different.

The client did receive the update to Flipper = 0, but the acknowledgment for that specific update was dropped. The host, not having received that ack, still believes the client's last confirmed state is Flipper = 1.

To elaborate a little more on this issue. I left out the fact that the host saves predicted data and uses that to "guess" what the value is on the client. The problem is that the predicted data is cleared when ANY acknowledgement is received, even one for an older snapshotID. A key factor in this issue is that the clients ack for (1,1) must be received by the host after it has saved (2,0) into the predicted data. Then when the ack for (1,1) comes in, it runs this code:


    [Description("Update the value in the stored snapshot.")]
    [SourceLocation("Scene\\Networking\\DeltaSnapshots\\SnapshotState.cs", 58)]
    public void Update(int slot, ushort snapshotId, byte[] data)
    {
      SnapshotState.Entry entry;
      if (this.Data.TryGetValue(slot, out entry) && !SnapshotState.IsNewer(snapshotId, entry.SnapshotId))
        return;
      this._predictedData.Remove(slot);
      this.Data[slot] = new SnapshotState.Entry(snapshotId, data);
    }

This is hard to spot because it looks like the snapshot state is checked to see if it's newer, however this is not checking to see if the predicted data is newer. It's only checking this.Data which is the last ack'd state and then if that is older we just delete any predicted data, even if that is newer. Then when the host runs TryGetValue it returns (1,1) and the flipper isn't sent because it matches.

Here's what happens:

  1. Counter: 0, Flipper: 0
  2. Counter: 1, Flipper 1 (state different, both sent)
  3. Counter: 2, Flipper 0 (state different from predicted data (1,1), both sent)
  4. Ack received for 1. Predicted data removed. Ack for 2 dropped (but still received by client).
  5. Counter 3, Flipper 1 (Counter different, Flipper the same as previous ack'd value, not sent)

A fix could be to check the snapshotID of the predicted data and then only remove it if it matches the ack snapshotID. Although this would still leave a possible race condition if the predicted data expired (after it's 250ms window) and the variable was changed to the previously ack'd value at the same time. I think you also need to have a forced resend if the predicted data expires and it hasn't been ack'd.

andy013 avatar Oct 07 '25 15:10 andy013

I just want to add that I think the current resending method isn't great. To wait 250ms before you resend the data seems like an awful long time. That means if you drop 1 packet then at minimum there will be a 250ms hitch, even if your update rate is much faster than that. I don't know if there are performance reasons for this but I think it would be much better to just resend all of the dirty variables each update until they are acknowledge by the client.

You could keep a list of all the dirty variables on each snapshot and then union all of them since the last acknowledge snapshot from the client. Then just keep sending them every update until the client ack is received. Yes this means sending data multiple times but it also means if you drop a packet, the very next one has all the data and so you recover extremely quickly.

I also don't understand why each GameObject update is sent individually rather than batching all the changed GameObjects together. It can cause weird situations where the host updated 2 GameObjects at the same time, but the client only updates 1 and not the other. I know they are clustered together when sent, but I'd be nice if there was an option to have atomic updates, guaranteed to update everything at once or nothing.

andy013 avatar Nov 13 '25 06:11 andy013