scc-lan-restore icon indicating copy to clipboard operation
scc-lan-restore copied to clipboard

Out of sync if not same cpu

Open vince1016 opened this issue 1 year ago • 31 comments

I know thats something you have found something and you doesn't really understand the code, but just trying to let you know thats something that would be great to resolve. I think thats really related to not having the same cpu brand. Will test it when i can. I can join no problem with hamachi tho on windows 10

vince1016 avatar Apr 18 '24 02:04 vince1016

yeah.. that would be great indeed. i'm very busy at the moment however. if anyone has findings on their side they can post them in this issue.

Ododo avatar Apr 20 '24 00:04 Ododo

Hi ! First of all, thanks a lot for your work. My brother and I wanted to play this game Coop for a long time, and now we are closer than ever to be able to !

I just want to let you know that indeed, there is a problem with Ryzen (or at least, with the Intel/Ryzen coop combo). I tried to play on my main PC which has a R9 5900X with my brother who has an Intel Core i5, and at the first input, we are disconnected.

I then tried on my laptop which is Intel based, and it worked !! First try we got a disconnection, but second try worked !

If you find something to correct this issue, that would be awesome.

Thanks again !

Splainte avatar Apr 28 '24 19:04 Splainte

if anyone has findings on their side they can post them in this issue.

I checked and the error displayed has "DivergenceMessageBoxMessage" id in localization files, and that id is used only by one function.

...
      *(float *)&dword_142EB64 = *(float *)&dword_13A31CC + *(float *)&dword_142EB64;
      if ( *(float *)&dword_142EB64 >= 2.0 && sub_8C4079((_DWORD **)dword_142EB7C) )
      {
        v3 = sub_792B49();
        v4 = (*(int (__thiscall **)(int))(*(_DWORD *)v3 + 496))(v3);
        if ( !(*(int (__thiscall **)(int))(*(_DWORD *)v4 + 400))(v4) )
        {
          if ( dword_142EB7C )
            sub_8C409B(dword_142EB7C);
          sub_40C3A0((void *)&Caption);
          sub_40C3A0((void *)&Caption);
          sub_40C3A0((void *)&Caption);
          sub_40C3A0((void *)&Caption);
          sub_77BB27(83, v14, v12, 1, 0, 0, v15, v13, 1, 1, -1.0);
          sub_40C40E(v14);
          sub_40C40E(v12);
          sub_40C40E(v15);
          sub_40C40E(v13);
          v9 = 11;
          v5 = sub_445900((int)&off_10D3024, (int)L"DivergenceMessageBoxTitle", (int)"Localization\\System", 0, 0);
          sub_40DBB2(v10, v5);
          v6 = sub_445900((int)&off_10D3024, (int)L"DivergenceMessageBoxMessage", (int)"Localization\\System", 0, 0);
          sub_40DBB2(v11, v6);
          v11[7] = dword_142EB84;
          v11[5] = 1;
          v11[6] = 1;
          v7 = sub_792B49();
          (*(void (__thiscall **)(int, int *, int))(*(_DWORD *)v7 + 340))(v7, &v9, 3);
          sub_77BC47(&v9);
        }
...

Tried to disable this code path, but the game just gets stuck without an error message. This code is not executed until the button is pressed, or rather character movement occurs.

int __usercall sub_7C4A7A@<eax>(MassiveAdClient3::CMassiveAsset *a1@<ebx>, float a2)
{
  int result; // eax
  int v3; // esi

  result = sub_7C0894();
  if ( result )
  {
    sub_45FA0A();
    sub_45F36B(0);
    v3 = sub_7C2FCF(LODWORD(a2)); // this is executed always
    sub_45FA0A();
    result = sub_45F36B(1);
    if ( v3 ) // false until player moves and out of sync occurs
    {
      result = sub_7C0745();
      if ( !result )
        return sub_7C465E(a1); // out of sync error message displayed here
    }
  }
  return result;
}

Overall it might be an issue described here, since many people confirmed different CPUs cause out of sync error.

https://cookieplmonster.github.io/2020/07/19/silentpatch-mass-effect/#part-3

ThirteenAG avatar Jul 23 '24 10:07 ThirteenAG

if anyone has findings on their side they can post them in this issue. Overall it might be an issue described here, since many people confirmed different CPUs cause out of sync error.

https://cookieplmonster.github.io/2020/07/19/silentpatch-mass-effect/#part-3

I can send you some Wireshark records from intel-ryzen and intel-intel sessions, if it may help in research

ghost avatar Jul 23 '24 11:07 ghost

I wouldn't know what to do with them, just upload them here and maybe someone else can take a look.

ThirteenAG avatar Jul 23 '24 11:07 ThirteenAG

Thanks for sharing @ThirteenAG I have bit searched for sse related issues but did not come across this paper. We might aswell just blindy test the solutions proposed here then ..

Thats where i stopped so far: dword_13A31CC or dword_142EB64 seems to be some kind of divergence accumulator which main calculation resides in a function syncing both players. It is the same routine capping both player at a fixed framerate (30).

Function starts like this:

void FUN_0042ce22(int *param_1)
  ...
  (**(code **)(*(int *)PTR_PTR_01265ee4 + 0xac))(1);
  local_c = 0x78;
  (**(code **)(*DAT_013a0fe0 + 4))("Engine.Display","MaxFPS",&local_c,"User.ini");
  iVar5 = 0x1e;
  if ((0x1d < local_c) && (iVar5 = 1000, local_c < 1000)) {
    iVar5 = local_c;
  }

Then something like:

For frame:
  float delayAcc;
  float tpf = 1/MAXFPS // time per frame
  s = QueryPerformanceCounter();  // portable across platforms
  do_stuff_for_player1();
  a = QueryPerformanceCounter();
  probably_do_stuff_with(a-s)(&delayAcc);
  do_stuff_for_player2();
  b = QueryPerformanceCounter();
  probably_do_stuff_with(b-a)(&delayAcc);
  diff  = b-s:
  if (diff > tpf) { delayAcc += (diff - tpf) }
  if (diff < tpf) { wait_until_frame_is_finished; go to next frame }

Disabling the threshold completly gives pretty much same results as yours, but i have some generic error message instead smthing like "connection with remote player lost". So I was even wondering if the divergence is the main issue here, or if it just a result of some previous operation failing and the game happens to catch it as divergence event.

Ododo avatar Jul 27 '24 00:07 Ododo

Something I also noticed while testing coop with two instances of exe, after initial movement game freezes for a bit, and normally after that, when playing with another pc, out of sync error appears, yet locally it unfreezes and continues. Maybe the problem is localized to whatever is happening during that freeze.

ThirteenAG avatar Jul 27 '24 01:07 ThirteenAG

Overall it might be an issue described here, since many people confirmed different CPUs cause out of sync error.

https://cookieplmonster.github.io/2020/07/19/silentpatch-mass-effect/#part-3

Unless i did something wrong disabling PSGP did not help.. Tried with: DisablePSGP DisableD3DXPSGP DisableD3DX10PSGP On both intel and amd machines.

Do note that for dx9 the variable is DisableD3DXPSGP, but it is DisableD3DX10PSGP for dx10.

I wonder what could happen if we spoof cpuid though..

Ododo avatar Jul 28 '24 17:07 Ododo

Well, I tried to nop all cpuid instructions, doesn't make any difference.

ThirteenAG avatar Jul 29 '24 03:07 ThirteenAG

Just played with friend first coop map. No issues what's so ever, silky smooth 60fps. Used scc_lan_helper EXE version. My CPU - Ryzen 9 5900X His CPU - Ryzen 7 3800XT

2nd map - quite some desyncs. 3rd map - no desyncs.

Not sure if these are map related things or not.

r3538987 avatar Aug 17 '24 17:08 r3538987

@r3538987 my CPU: AMD Ryzen 5 6600H - Shall we play together in co-op mode?

englerd avatar Oct 01 '24 10:10 englerd

Overall it might be an issue described here, since many people confirmed different CPUs cause out of sync error.

https://cookieplmonster.github.io/2020/07/19/silentpatch-mass-effect/#part-3

I have tried to put this dll into the game folder but it won't work because Conviction uses d3dx9_41.dll not d3dx9_31.dll

Unrelated but for your fusion mod, if you add a file to the game folder called steam_appid.txt and put inside that 33220 it will let the game open directly from the EXE without giving the weird Steam error. Steam Overlay isn't loaded this way but otherwise launches okay.

agret avatar Oct 15 '24 07:10 agret

Unrelated but for your fusion mod, if you add a file to the game folder called steam_appid.txt and put inside that 33220 it will let the game open directly from the EXE without giving the weird Steam error. Steam Overlay isn't loaded this way but otherwise launches okay.

It works with steam version, or any version rather.

ThirteenAG avatar Oct 15 '24 13:10 ThirteenAG

Your fusion mod patch works with any version of the game, it's the DirectX patch from Mass Effect 3 that redirects the math functions that i'm talking about. Splinter Cell targets a slightly newer version of the dll so just dropping in that one doesn't work, neither does renaming it to d3d9x_41.dll as it has some new functions in the updated file it just gives a missing export error when trying to launch the game. To fix those DirectX math functions we need an updated version of the dll targetting d3d9x_41.dll.

or did I misunderstand this reply? I haven't tested the steam_appid.txt addition on any non-steam version of the game to see if it causes any issue but it shouldn't do anything with that file unless it has the Steam imports in the game exe. This fix will work for any Steam version of Splinter Cell Conviction yes, doesn't matter what update it's on. Originally I thought we might need to add steam_api.dll or steamclient.dll but seems that it is statically linked to the game exe on Steam version so the only file we need to add is the missing steam_appid.txt

agret avatar Oct 16 '24 15:10 agret

I tried the dll and it doesn't make any difference. The problem is in exe.

ThirteenAG avatar Oct 16 '24 15:10 ThirteenAG

This thread is dedicated to solving this particular issue, so please only post what is relevant.

I'll just post this: https://github.com/openssl/openssl/issues/2848 , as we might want to look into something similar, although it does not match the fact that nopping cpuid changes nothing. Conviction embeds openssl 0.9.8l

Ododo avatar Oct 16 '24 20:10 Ododo

For the love of god, you should first try to understand your constraints maybe. The problem in mass effect is that AMD cpus have a slightly different SSE accuracy (ostensibly still within the IEEE 754 spec), which alas is still enough to break some oversensitive shader that had always flown under the radar with Intel's tolerances or 3DNow. Now, I don't know a thing about SSC, but to be sure they aren't using the direct3d processor specific geometry pipeline for network purposes (similarly it also seems incredibly unlikely for them to have compiled the main executable with /fp:fast). And least of all, openssl missing support for ryzen SHA hardware instructions couldn't have anything to do with anything.

This said QPC was mentioned above, and despite what the close comment says that is in fact a very shaky thing. https://news.ycombinator.com/item?id=7923135 https://forums.blurbusters.com/viewtopic.php?f=10&t=11951&start=40 https://stackoverflow.com/questions/35601880/windows-timing-drift-of-performancecounter-c/#35603807 Like, not sure I would call that Zen 2 and 3 test a success, if even a single case wasn't smooth. Are you confident it's a mismatched cpu thing, rather than just many systems being awful and of course the probability of two random users having the same hardware is low? You could try tinkering with tscsyncpolicy, useplatformclock and useplatformtick.

mirh avatar Nov 21 '24 23:11 mirh

Hello @mirh

For the love of god, you should first try to understand your constraints maybe.

Please the only people that deserve to be yelled at are probably ubisoft for delivering a broken game, not the ones trying to fix it.

don't know a thing about SSC, but to be sure they aren't using the direct3d processor specify pipeline for network purposes

From what i know the game uses a replay system for multiplayer (2 players), every input is sent and replayed on remote machine. The sync loop is tied to the ability of the 2 PCs to handle frames at a specific frame rate. Differences in frame processing timing could lead to desync.

This said QPC was mentioned above, and despite what the close comment says that is in fact a very shaky thing.

Indeed, oscillators are shaky things

https://news.ycombinator.com/item?id=7923135 https://forums.blurbusters.com/viewtopic.php?f=10&t=11951&start=40 https://stackoverflow.com/questions/35601880/windows-timing-drift-of-performancecounter-c/#35603807

Both my Intel (old) and AMD(recent) platform comes with hpet disabled and Invariant tsc mode at fixed frequency (10mhz) , and desync occurs still. I have assumed most of the systems have the same modes by default this is why it looks like "portable" enough for me , but it might be wrong and there can be subtleties for instance in multi-thread context.

Are you confident it's a mismatched cpu thing, rather than just many systems being awful and of course the probability of two random users having the same hardware is low?

I would not say confident, but this CPU mismatch theory is what was observed so far by the community.

You could try tinkering with tscsyncpolicy, useplatformclock and useplatformtick.

Tried a bit with platform clock/tick a while ago. might look on the policy side

Kr

Ododo avatar Nov 22 '24 16:11 Ododo

Differences in frame processing timing could lead to desync.

I'll grant that after seeing EAX bugs causing glitches in doom 3 because the X-Fi patch synchronized some visual effects with sound, nothing surprises me anymore.. but graphics doesn't seem the kind of thing that should have a feedback on the game loop. And it still doesn't appear to be something that a solid fps cap couldn't handle.

I have assumed most of the systems have the same modes by default

Ryzen in particular is an absolute clusterfuck of defaults and recommendations. Anandtech even had to review again Zen+ because of HPET voodoo. Let alone that I don't even know the headache it gives on older cpus when coupled with spectre/meltdown mitigations. And I just found more.

but it might be wrong and there can be subtleties for instance in multi-thread context.

Then does it solve anything to force the game on a single core? Or I guess booting with a single core.

I would not say confident, but this CPU mismatch theory is what was observed so far by the community.

So... I haven't played the game in like 13 years give or take, but do we have at least a solid understanding of when it does work?

mirh avatar Nov 22 '24 23:11 mirh

It doesn't work, even on same pc playing two instances shows out of sync often. I barely managed to complete campaign, and some deniable ops.

ThirteenAG avatar Nov 23 '24 02:11 ThirteenAG

I was kinda fearing that'd the case, with enough pedantry. I suppose the golden standard for testing could be two VMs on the same pc, with just a single core assigned to each, and with processor idle states disabled in the host power settings? Not sure for the right TSC setting.

mirh avatar Nov 23 '24 16:11 mirh

Then does it solve anything to force the game on a single core? Or I guess booting with a single core.

TBD ( testing is time consuming, especially for this game as it requires a bit of a setup )

So... I haven't played the game in like 13 years give or take, but do we have at least a solid understanding of when it does work?

So to clarify a bit i think we might have 2 problems in parallel:

  • One of it sure is bad design I think the programming model of the multiplayer cannot ensure even 80% of reliability, might it be subject to something like network asymmetry or whatever, as "desync" events always been frequent. I don't think we will ever be able to solve this one

  • Another is described in https://github.com/Ododo/scc-lan-restore/issues/3#issuecomment-2244856278

    Both players join, one of them issue keyboard event, both goes desync. (only mouse event: ok) On "incompatible platforms", this always occurs. On "compatible platforms" gaming is ok but then might be subject to first issue It has been observed by the community (does not make it right, and yes i know its sound silly, but havent been able to prove it wrong) that: - "incompatible platforms" seems to be Ryzen family cpus versus Others - "compatible platforms" seems to be (non-ryzen/non-ryzen) or (Ryzen/Ryzen)

On this last issue i might even think of looking at the input subsystem (for instance what if a keyboard event is not recognized by remote machine) but then it makes no sense to me that it would be related to cpu family. Now what about input delays, this is more related to the platform, and the sync loop might depend on it too.

(edit: + what about windows version ? owner of olders CPUs might also have older windows version (win7 for instance))

See, we are a bit in the dark that for sure :)

Ododo avatar Nov 23 '24 22:11 Ododo

, especially for this game as it requires a bit of a setup

Uplay?

I don't think we will ever be able to solve this one

I mean.. With all the modern horsepower of the world, certainly it should be possible to build a controlled enough environment? Even if it means running this inside of 86Box with softgpu or software_D3D9.

It has been observed by the community (does not make it right, and yes i know its sound silly, but havent been able to prove it wrong) that:

I see, so the situation is more like "some systems disconnect, eventually" (i.e. minutes?) and "some systems disconnect right out of the gate"? And when do you say ryzen, do you actually mean ryzens, or is it just a shorthand for amd?

(edit: + what about windows version ? owner of olders CPUs might also have older windows version (win7 for instance))

I don't think the correlation is that solid nowadays (even though I suppose that if you want to have it run on bare metal, the latest few years of hardware have become pretty much a nightmare if not impossible to boot), but certainly testing if XP or Vista or 7 couldn't have a easier life doesn't hurt.

mirh avatar Nov 24 '24 00:11 mirh

I mean.. With all the modern horsepower of the world, certainly it should be possible to build a controlled enough environment? Even if it means running this inside of 86Box with softgpu or software_D3D9.

In the end the goal is that people can actually play the game, if the setup is too constraining it does not worth the effort imh. Its also a question on how much time we want to allocate to this, and personally i think solving the second issue is a reasonable target .

I see, so the situation is more like "some systems disconnect, eventually" (i.e. minutes?) and "some systems disconnect right out of the gate"? And when do you say ryzen, do you actually mean ryzens, or is it just a shorthand for amd?

Yes and i mean Ryzens (not a shorthand for amd)

Ododo avatar Dec 02 '24 18:12 Ododo

Of course that 5D chess setup wasn't going to be a daily driver. But I just meant in order to pinpoint the shaky element.

Yes and i mean Ryzens (not a shorthand for amd)

So, would you say that bulldozer, phenom, jaguar or exacavator fit into the same bucket of intels?

mirh avatar Dec 02 '24 19:12 mirh

So, would you say that bulldozer, phenom, jaguar or exacavator fit into the same bucket of intels?

I would say yes. From user reports, own tests and because this case of systems disconnecting right out of the gate appeared with Ryzens introduction, which makes me think it could be something afterall, but it's still theoretical.

Ododo avatar Dec 02 '24 20:12 Ododo

Feedback just my limited testing. 2 local PCs with wired LAN, 60fps vsync, (Fusion mod) got disconnected after 5-10 minutes every time, 1st coop level. 12400F and 14600k. W11 and w10. I will try 6core 6 thread setup on same 4Ghz first. Then set manual 1 or 2 core affinity, then lower coop fps to 30. I remember playing this game 10+ years ago, and i don't remember out of sync issue when played online... but SCCT 15 years ago had the same sync problem, which was also very frustrating.

LSL1337 avatar Dec 04 '24 15:12 LSL1337

Ok, it was working today for 1+ hours, 0.0 desync with 60fps coop. 2 LAN PCs with 0 ping. Turned off HT on both 12400F and 14600k I turned off E cores on the 14600k. Didn't change clockspeeds 5.3Ghz and 4.0Ghz all core. One machine W10 one is W11. Installed fusion mod, disabled the affinity fix and the blacklist controls. This makes the game only use 4 threads and 4 cores. Also lowered display res to 1080p and 1440p to make sure, it's never below 60fps. Later I will test a few things, to see which of these could lead to desync. I think E core or HT will be the main issue. Maybe when the OS changes some threads around, it leads to some issues. Maybe the whole thing can be fixed by only using only the first thread of the first 4 cpu cores. So Thread 0,2,4,6. Not sure what would happen with HT that way, worth a shot. I will also change back to 4k120 display res with 60fps coop limit, to see if the 2 client performance have anything to do with it.

LSL1337 avatar Dec 07 '24 15:12 LSL1337

Really thanks for the tests, it seems like it wasn't all that much crazy voodoo after all. Especially if it's either HT or little cores.. that should be actually super easy to fix with windows compatibility shims (or even your low tech cheat).

Btw allow me to introduce our pal, ACPI's Power Management Timer https://sites.google.com/view/melodystweaks/misconceptions-about-timers-hpet-tsc-pmt

mirh avatar Dec 17 '24 22:12 mirh

Was reading few articles, about this and even on Xbox on Gamespot forums people reported out of sync errors quite often. On PC yea, people came years before that difference in CPUs may be a problem. Many tested various options: https://steamcommunity.com/app/33220/discussions/0/3105765614234907411/ Here are a lot of old fps discussions: https://steamcommunity.com/app/33220/discussions/0/1519260397775827213/ Floating point rounding mentioned: https://www.reddit.com/r/Splintercell/comments/14ijkhi/conviction_coop_does_not_work_unless_both_people/ (Just imagine if your computer uptime affects something here as well, as mentioned here by Silent: https://cookieplmonster.github.io/2018/08/07/high-resolution-timers-and-uptime-headaches/ Both, comment above and blog post mentioned QueryPerformanceCounter.

Had some thoughts,

  1. If usually tested scenarios are 30-30, 60-60 fps. Session wont work straight away if 30-60 combo is used?
  2. SyncMaxStepFrequency=60 in FusionFix, if lowered to 30 and FPS kept at 60, this is also no go?
  3. Also Blacklist, that game doesn't have such amount of desync comments on web. What if we could reverse-peak into code there and compare. At least both are on same engine, maybe clues on improvements can be found there.

When we played with friend (specs mentioned above) I liked to take few screenshots and videos via Nvidia Shadowplay. And there definitely were few cases when my (host) keypress combo triggered slight spikes in frametimes, and shortly after sessions went to desync. (likely captain obvious) Leads me think that there is timeframe in which if certain actions won't exchange or happen, game closes session.

r3538987 avatar Mar 20 '25 21:03 r3538987