BizHawk icon indicating copy to clipboard operation
BizHawk copied to clipboard

[Core Port Req.] eien's quickerNES

Open YoshiRulz opened this issue 5 months ago • 22 comments

Merged on 2024-03-22. First dev build is here.

Open tasks:

  • [ ] re-triage remaining issues in the Core: QuickNes label (at the time of the merge, there were 3)
    • [ ] also decide whether to continue using that label or create a new one
  • [ ] update build scripts (done with https://github.com/TASEmulators/BizHawk/commit/d17352a20411a98c76494006d94d9193cbc95a68 ?)
  • [ ] update Nix expression -- assigned to Yoshi
  • [ ] evaluate which services could be implemented

YoshiRulz avatar Jan 25 '24 02:01 YoshiRulz

Parallel to the licensing situation, there are a couple of technical tasks to adopt the new core:

  • [x] Adjust the BizHawk interface to allow selecting FourScore
  • [x] Test audio is still correctly emulated (not automatically tested in the source repository)
  • [x] Add better error handling in QuickerNES: i.e., throwing sensible exceptions as opposed to segfaulting.
  • [x] Make sure all features (search, hex editor, TASstudio, etc...) still work and existing movies sync.
  • [x] Make CIRAM write-able, like in NESHawk

Some discussions to be had

  • [ ] The game Ivan 'Ironman' Super Off-Road supports FourScore. That means it reads 32 bits during the latch phase of controller input reading. And it seems it also uses this value for RNG generation. This is the case even if you use normal joypad inputs (only 8 bits provided). The original QuickNES passes all ones beyond bit 8, while I think NesHawk passes zeroes. Naturally, this causes desync between the two cores. I believe the NesHawk is correct since there should be no electrical signals after 8 bits during latching phase. I would like to fix this in the new core (namely, do what NesHawk does). However, this means that the Super OffRoad movie will desync on QuickNES from now on (and probably sync on NesHawk).
  • [ ] I forgot to document this in the README file, but I removed all overheads (signatures, header name and size) from the save state format. This reduces save state size from 12792 to 12692, but makes them no longer cross-compatible. My tests hash the actual contents of the RAM and not the savestates files 1:1.

If it's ok, I will create a new PR with this integration and close #3839

SergioMartin86 avatar Jan 25 '24 06:01 SergioMartin86

We don't guarantee movie sync or savestate compatibility when a core updates, so those won't be deal-breakers. As you know, QuickNES is our speed-focused core, and any accuracy improvements you can make are just a welcome bonus.

Despite my phrasing, there's no rush here. Focus on problems which affect you before worrying about the BizHawk interface code. And when the time comes, don't forget that the managed-unmanaged interface is also open to change.

YoshiRulz avatar Jan 25 '24 10:01 YoshiRulz

Improve overall emulation performance for modern (x86) CPUs (portability to other systems not guaranteed)

Does this mean it wouldn't build for other CPU archs (e.g. ARM?).

Also, how does this end up affecting performance with older x86 CPUs, ones which would more need the speedup QuickNES provides for a casual user.

CasualPokePlayer avatar Jan 25 '24 12:01 CasualPokePlayer

Improve overall emulation performance for modern (x86) CPUs (portability to other systems not guaranteed)

Does this mean it wouldn't build for other CPU archs (e.g. ARM?).

I'm not sure, I haven't tried it. Since ARM used big endianness, there might be problems. I wish I had a system to try this on.

Also, how does this end up affecting performance with older x86 CPUs, ones which would more need the speedup QuickNES provides for a casual user.

I don't have an old machine to test this on. I tried 2 server-grade computers and my own (a mid-range Ryzen) and all these systems got similar results.

SergioMartin86 avatar Jan 25 '24 16:01 SergioMartin86

ARM used big endianness

Not sure where you heard this, ARM is a little endian CPU (well, in practice it always is for any kind of ARM user)

CasualPokePlayer avatar Jan 25 '24 16:01 CasualPokePlayer

Since waterbox only supports x86-64, other cores being x86-64 only isn't really a deal breaker.

nattthebear avatar Jan 31 '24 12:01 nattthebear

Keep in mind we could simply disable Waterbox and unmanaged cores for an AArch64 build. But I agree it's not a concern here.

YoshiRulz avatar Jan 31 '24 14:01 YoshiRulz

Just a heads up. I'm still polishing details with the core itself. As soon as I'm done I'll work on the Bizhawk interface to add FourScore support and access to the different memory domains.

SergioMartin86 avatar Jan 31 '24 15:01 SergioMartin86

Alright, so I'm done porting the new core into BizHawk (see: #3839). This is a self-sufficient update that could be merged right away.

On the other hand, I still haven't added an interface to enable FourScore. It's been 15 years since I last programmed in C#, so I'm a bit lost as to how to extend the interface. Can you please help me add that option? The idea would be to have a dropdown like in NesHawk:

image

But only options:

  • Unplugged
  • Nes Pad
  • Fourscore (port 1)
  • Fourscore (port 2, it differs in signature)

And then have these changes reflected in the inputs you can make in TAS studio.

image

SergioMartin86 avatar Feb 24 '24 05:02 SergioMartin86

Revisiting your checklist above, is anything still missing (which is crucial for botting)? I quickly tested memory peeking and that works.

I don't have an old machine to test this on. I tried 2 server-grade computers and my own (a mid-range Ryzen) and all these systems got similar results.

I have an Intel i5-2400. On https://github.com/TASEmulators/BizHawk/commit/0857dd6771938a864c31eef891974320a3342c98, I'm averaging ~1400 fps. On https://github.com/TASEmulators/BizHawk/pull/3839/commits/01f223e3ad440be4323dd71a72c702ee06560224 with the same setup, ~1450 fps. It may even be higher, but the fluctuations are too wide for me to tell by eye, and I can't be bothered logging it.

YoshiRulz avatar Mar 20 '24 22:03 YoshiRulz

Revisiting your checklist above, is anything still missing (which is crucial for botting)? I quickly tested memory peeking and that works.

Yes, everything I need is there.

I have an Intel i5-2400. On 0857dd6, I'm averaging ~1400 fps. On 01f223e with the same setup, ~1450 fps. It may even be higher, but the fluctuations are too wide for me to tell by eye, and I can't be bothered logging it.

There are a couple reasons why the overall impact on fps might not be as high:

  1. In my benchmarks, I had tested the core in isolation. When adding the overhead of the interface / C# interaction, some of these gains are diluted.

  2. There is a big optimization that I couldn't implement in the Windows .dll. That is: aligning the CPU main function to 1024 bytes. This is a very long function that needs aligning for it not to cross cache lines, but unfortunately MSVC's C++ compiler does not support it.

These points apply regardless of whether you use a mid-end or server-grade CPU.

SergioMartin86 avatar Mar 21 '24 05:03 SergioMartin86

unfortunately MSVC's C++ compiler does not support it.

Does clang-cl support it? Building under clang-cl/clang where possible would generally be preferred.

CasualPokePlayer avatar Mar 21 '24 05:03 CasualPokePlayer

After a cursory search, seems like clang doesn't support individual function aligning. It does support aligning all functions but this is only reasonable for a small number of bytes.

Afaik, GCC is the only compiler that supports this (very important) feature.

SergioMartin86 avatar Mar 21 '24 08:03 SergioMartin86

There is a big optimization that I couldn't implement in the Windows .dll. [...]

I'm on Linux (which also means the .NET overhead you mentioned is higher).

Yes, everything I need is there.

You're good to rebase then! One big commit is fine. We'll merge this ASAP and deal with secondary things like the Nix expr. and the readme afterwards.

YoshiRulz avatar Mar 21 '24 10:03 YoshiRulz

You're good to rebase then! One big commit is fine. We'll merge this ASAP and deal with secondary things like the Nix expr. and the readme afterwards.

Sync to master HEAD done. No conflicts and everything still works fine

SergioMartin86 avatar Mar 21 '24 15:03 SergioMartin86

Afaik, GCC is the only compiler that supports this (very important) feature.

A bunch of our cores were built with mingw, so why not this one too? I'm pretty sure the initial quicknes core was built with mingw as well.

BTW from my experience with MAME, GCC and clang builds are the fastest, and msvc and clang builds are the smallest, so maybe gcc will make it even faster by default.

vadosnaprimer avatar Mar 21 '24 17:03 vadosnaprimer

I'm pretty sure the initial quicknes core was built with mingw as well.

Yes. I don't know what the current situation is, but at some point the real, actual, production core dll we shipped to end users on windows in release builds was built with mingw.

nattthebear avatar Mar 21 '24 22:03 nattthebear

I'm still interested in seeing @SergioMartin86's reply to the last 2 comments.

vadosnaprimer avatar Mar 22 '24 15:03 vadosnaprimer

I'm still interested in seeing @SergioMartin86's reply to the last 2 comments.

I'd say MinGW is definitely worth a try, especially if it can recognize this annotation:

https://github.com/SergioMartin86/quickerNES/blob/b6e742d5e3fe46e96fce2b198d229201e2967837/source/quickerNES/core/cpu.cpp#L227

SergioMartin86 avatar Mar 22 '24 16:03 SergioMartin86

mingw is just an environment, it has legitimate GCC and the rest of the toolset, just built to work in that environment and produce Windows apps.

vadosnaprimer avatar Mar 22 '24 16:03 vadosnaprimer

@YoshiRulz Could you clarify why this was reopened? Is there outstanding work? Are we doing the function align thing?

nattthebear avatar May 01 '24 15:05 nattthebear

See checklist in OP.

YoshiRulz avatar May 01 '24 15:05 YoshiRulz