dosbox-pure icon indicating copy to clipboard operation
dosbox-pure copied to clipboard

Add Pentium Pro processor instructions

Open rodrigoandrigo opened this issue 2 years ago • 24 comments

features it would be nice to add pae(Physical Address Extension), cmov(Conditional Move) and cx8(CMPXCHG8B) instructions for better compatibility.

keep up the good work.

rodrigoandrigo avatar Jul 08 '22 02:07 rodrigoandrigo

These are all features first introduced in the Pentium Pro processor (predecessor to the Xeon processors) which weren't really intended for games of that era. Do you know many games that that would profit from having these instructions?

In my opinion, DOSBox Pure is focused on running games up to around, let's say,1997-ish. Anything newer than that is cool if it runs playable for people, but it might ask a bit too much of good old DOSBox. More specialized virtual machine solutions or something like Wine might make more sense at some point.

schellingb avatar Jul 08 '22 16:07 schellingb

In my opinion, DOSBox Pure is focused on running games up to around, let's say,1997-ish. Anything newer than that is cool if it runs playable for people, but it might ask a bit too much of good old DOSBox.

You underestimate the power of your creation :) From 1999 voodoo 1/2 is considered obsolete. Not many 3d accelerated games from 2000+ will work fast enough on this hardware (if at all). But some do, and if you set core to dynamic, and disable force normal core, most of those games runs smoothly with 100% emulating speed, set to "Pentium 2 300mhz, 200000 cycles". Examples: Croc, Star Wars Racer, Tomb Raider 4

And if we take 2D games, we can go even further. All of my 2D games up to 2001 (Civilization III) run flawlessly. Unfortunately some of them are not working at all, but if the game is working - it work fast enough to be fully playable!

xttx avatar Jul 09 '22 14:07 xttx

In my opinion, DOSBox Pure is focused on running games up to around, let's say,1997-ish. Anything newer than that is cool if it runs playable for people, but it might ask a bit too much of good old DOSBox.

Have you seen the Digital Foundry video on DOSBox-Pure? It was a nice surprise seeing this! They have DBP running on an Xbox Series X. There's some interesting benchmarks too, which replicate my experience on PC where using 3dfx acceleration is actually slower than software rendering, which is the opposite to real hardware of course. But it makes sense due to having to emulate the Voodoo 1 which is stressing the CPU more. With a Threadripper 3960X i can run Quake 2 at 1024x760 at 60fps in software rendering mode, but only 640x480 at 45 - 60fps with acceleration.

If you're interested i can post some benchmarks with AMD Zen 4/Ryzen 7000 or the Intel 13th gen Raptor Lake CPU's when these come out in a few months. Performance wise i'm expecting them to run 3D games from 2000 at very playable frame rates. The biggest problem will likely be the small 12MB VRAM of the Voodoo 1 at that point.

PoloniumRain avatar Jul 10 '22 22:07 PoloniumRain

@PoloniumRain, Voodoo benchmarks on Zen4/Raptor Lake will be very interesting, yes! Please post those.

It seems at one point even if you emulate Voodoo 5 6000 (via multiple host cores), the emulated CPU performance can be a bottleneck as it relies on a single host thread (thus #370).

Although that depends on the workload - these benchmarks show that Games will benefit with CPU emulation speed matching Pentium II 450, so host requirements shouldn't be outrageous.

Torinde avatar Dec 18 '23 06:12 Torinde

Voodoo benchmarks on Zen4/Raptor Lake will be very interesting, yes! Please post those.

I didn't get either of those CPU's in the end, but i'll definitely get Zen 5/9950X (or whatever it will be called) whenever the X3D version is released. I'll benchmark that :)

PoloniumRain avatar Dec 21 '23 18:12 PoloniumRain

DOSBox SVN patch for PPro and its X implementation (there are further additions to bring that to PII/III level).

Torinde avatar Mar 10 '24 12:03 Torinde

With a Threadripper 3960X i can run Quake 2 at 1024x760 at 60fps in software rendering mode, but only 640x480 at 45 - 60fps with acceleration.

What do you mean by "software rendering" vs "with acceleration" - the renderer selected in Quake2?

What was the CPU utilization on the host while doing those tests? I assume one thread was at maximum in both cases, while in the case of emulating the Voodoo ("with acceleration"?) - how many threads were utilized?

For reference - Quake2 OpenGL 1024x768 (Voodoo2 SLI - 2 cards):

  • 44fps on K6-2 333MHz without 3DNow! driver
  • 67fps on K6-2 333MHz with 3DNow! driver
  • 69fps on P2 333MHz

So, I'm also wondering if emulating 3DNow! will help, since:

  • on real hardware it brings 50% improvement
  • enabling MMX emulation in Staging improved the speed multiple times! (for some tests that support MMX)

Torinde avatar Apr 10 '24 22:04 Torinde

I was running this excellent DOS port of Quake II and using the games 3dfx OpenGL renderer, so it's using Voodoo. That's what i meant by acceleration.

This is just a very quick (and bad) test with 3dfx OpenGL but surprisingly a decent amount of threads are used...

Q2perf

I have about 300 browser tabs open though... But roughly 85% of the CPU usage is from Q2. I also expected one thread to be at 100%, but instead 10 or so threads show increased usage with 6 of them increasing more than others. Task Manager is poor for monitoring this stuff though so i wouldn't pay that much attention to it.

And those MMX results are interesting! But i can't wait for new CPU's to arrive like Zen 5 so i can finally stop caring about these things. I mean i'm sure these will be able to brute force almost any emulator/settings for the next several years, maybe even PCem with Voodoo 3 + the fastest supported Pentium II, which is 450MHz. P2 233MHz is the limit for my 3960X.

PoloniumRain avatar Apr 12 '24 23:04 PoloniumRain

DOSBox Pure does software emulation of Voodoo 3dfx, always with 4 threads. This probably should be changed to have the code figure out how many cores the host CPU has and use more threads if there are more available. I'll put this on the TODO list.

schellingb avatar Apr 13 '24 00:04 schellingb

Thanks, @PoloniumRain for the further info! OK, so you are saying that this Quake2 port is faster in its "software rendering running on CPU emulated by Pure 1 thread" mode than its "OpenGL rendering mode running on Voodoo emulated by Pure 4 threads + CPU emulated by Pure 1 thread"? While having plenty of physical cores on the host... Does that mean the graphics quality is different? E.g. "software rendering mode" has much less fidelity than "OpenGL mode"? Otherwise I don't get it...

@schellingb, great to hear that, looking forward to see what that will deliver on Threadripper (OK, and on most regular modern CPUs, who go well above 4 threads).

Torinde avatar Apr 15 '24 07:04 Torinde

It's simple, both the Digital Foundry video and myself are only saying that it makes sense that running a game with Voodoo acceleration would produce a lower frame rate than software rendering. It's not specific to Quake II, that's just an example. It happens with 100% of games and it's because emulating a 3dfx Voodoo GPU, or literally any GPU for that matter, will require more processing power from the host machine to emulate. Just like running a PS3 emulator will be slower than running a PS2 emulator because the PS3 is a far more powerful console that requires much faster host hardware for emulation.

So while a physical GPU in a 1990's PC will make games run at higher frame rates, with emulation any emulated GPU has the exact opposite effect, even in cases where a game may have identical graphics and resolution to software rendering (but Q2 isn't one of the games, it looks very different with the Voodoo. Arguably much worse aesthetically but technically better lol).

But if many more threads can be used then i'd expect the frame rate gap will disappear...

PoloniumRain avatar Apr 15 '24 17:04 PoloniumRain

OK, so basically game developer can make more efficient 'direct' software rendering than the Glide emulation developer (due to lack of API overhead). Of course, if the result looks different it's another thing to consider.

Let's see how many host threads are needed to bridge that gap.

Bochs extends DOSbox Voodoo to Banshee and Voodoo3 with further improvements, LGPL BIOS, etc. and also there is one attempt to add Voodoo3 VGA to DOSbox-X.

Torinde avatar Apr 16 '24 06:04 Torinde