mame icon indicating copy to clipboard operation
mame copied to clipboard

taito/taito_f3_v.cpp: regain performance after major rewrite

Open y-ack opened this issue 9 months ago • 2 comments

addresses my own concerns with #11811 speed regression against previous implementation.

  • switch AoS z buffers and per-pix blend info to SoA
  • allow vectorization of line blending operation
  • regains empty line optimization by tracking tilemap row usage
  • consolidate sprite framebuffers (we still pull from it multiple times for each sprite priority group)
  • other minor wins from safe logic reorderings

y-ack avatar Apr 26 '24 14:04 y-ack

-window -nomaximize -bench 240 <set> of 1 Windows 11 / CI (Windows) / AMD Ryzen 7 7840HS

set e967a70 pre-rewrite 563b63fabf7a06c6dc94b48a1db2f8dba7292c15 rewrite 55c60e5 this pr
ringrage 606.71% 533.30% 630.98%
arabianm 685.52% 574.87% 695.64%
ridingf 635.20% 520.65% 608.64%
gseeker 683.36% 618.98% 743.95%
commandw 630.25% 560.72% 634.03%
hthero93 717.49% 588.87% 710.68%
scfinals 694.80% 587.99% 720.88%
trstar 690.49% 578.01% 706.09%
gunlock 609.94% 553.95% 639.41%
lightbr 668.88% 545.62% 651.80%
kaiserkn 637.06% 558.43% 649.98%
dariusg 724.94% 580.85% 733.24%
bubsymphj 686.23% 533.20% 646.04%
spcinvdj 721.39% 577.61% 729.34%
hthero95 667.58% 559.33% 681.61%
qtheater 692.09% 535.16% 660.61%
elvactr 738.29% 611.27% 736.44%
spcinv95 670.14% 561.78% 655.50%
twinqix 721.19% 581.23% 699.54%
tcobra2 639.72% 544.45% 607.79%
bubblem 616.16% 570.59% 661.85%
cleopatr 593.05% 497.74% 606.97%
arkretrn 599.81% 525.94% 599.10%
kirameki 698.47% 570.16% 673.45%
puchicar 585.04% 511.76% 591.27%
popnpop 598.05% 514.02% 606.06%
landmakr 735.18% 578.79% 700.83%

Windows 10 / CI (Windows) / Intel Core i5-7300U

set e967a70 pre-rewrite 563b63fabf7a06c6dc94b48a1db2f8dba7292c15 rewrite 55c60e5 this pr
ringrage 289.16% 248.27% 295.81%
arabianm 313.46% 272.34% 333.73%
ridingf 285.56% 226.03% 263.36%
gseeker 299.63% 283.37% 337.42%
commandw 277.50% 249.16% 282.81%
dariusg 322.64% 277.24% 332.32%
bubblem 294.36% 261.99% 302.12%
kirameki 311.88% 274.02% 312.47%
puchicar 251.74% 206.12% 267.41%
per-commit benchmark

-window -nomaximize -sound none -bench 60 commandw of 3 WSL 2.0.9.0 / AMD Ryzen 7 7840HS

commit description mean std.dev.
072367deb59bbd361902e7cb3ddf006cea01d7bf pre-fredyeye cleanup 502.53% 2.90%
59ae6c160227e2ae7834edf415072a39a911009e pre-rewrite 537.51% 0.71%
563b63fabf7a06c6dc94b48a1db2f8dba7292c15 f3 video rewrite 466.17% 1.42%
593664483642a5261e9035301602d5112174cfaf vas cleanup 455.61% 3.51%
f91b896cda8343fc41f069b32b7ef527364bdea1 [rebase point] 467.18% 2.77%
e5e3bd8875d8b6ea87be1d7837d68802632d6d9e SoA/blend vectorization* 520.14% 3.58%
7dcaecd91d2b4843627d7ac58e776eba97f17c53 AoSoA mistake fix* 519.07% 8.68%
cbc92f34b83514f127122040d6c565dc2360f612 merge sprite framebuffers 526.05% 0.84%
734879ea3808ec67f8b9ecee80c493336e630345 tilemap line usage* 529.54% 2.75%
49e45bdf87f1e0bbc60bb24e90553622f04c0328 mix_line ref params 534.88% 4.01%
7241a37e768d692651328b5739763b3ef9aa8a4e text line usage* 544.14% 2.16%
7fce16a9741ad53ca5b2da6ba1bf9b7a911cc2f0 fix extend+alt case 535.00% 5.65%
8412c5127291bfac4f78fe3dcf78fe7cd829a6f5 savestate correctness 532.07% 1.80%
53d541de9bf4e9a392e4aa740b2ec24c4e2836e9 strategic uint or layout jostling? 541.07% 3.48%

* validated in -O1 by callgrind cycle counting

i found commandw to be a good test case because it does heavy playfield and sprite scaling work for most scenes in its attract sequence, however, it does have a 6 second completely blank boot. as shown, most sets recover more unthrottled speed than was lost, and the ones that do not still recover most of it.

this system runs slower in general than many other arcade systems in MAME (the test ryzen here gets ~1700% on ibara, 4000% mrdo), but we found that this is not actually due to graphics bottlenecks. skipping all screen_update work, from here, achieved only ~18% (100 unthrottled percentage points) increase, while disabling the ensoniq subdevices resulted in ~180% (+1000% to 1635% unthrottled percentage points) increase (it has to emulate, like 3 or 4 processors in there, with synchronization)

y-ack avatar Apr 26 '24 14:04 y-ack

pivot layer regression found in vertical games, marking as draft again.

y-ack avatar Apr 30 '24 20:04 y-ack