mame taito/taito_f3_v.cpp: regain performance after major rewrite

addresses my own concerns with #11811 speed regression against previous implementation.

switch AoS z buffers and per-pix blend info to SoA
allow vectorization of line blending operation
regains empty line optimization by tracking tilemap row usage
consolidate sprite framebuffers (we still pull from it multiple times for each sprite priority group)
other minor wins from safe logic reorderings

Apr 26 '24 14:04 y-ack

-window -nomaximize -bench 240 <set> of 1 Windows 11 / CI (Windows) / AMD Ryzen 7 7840HS

set	e967a70 pre-rewrite	563b63fabf7a06c6dc94b48a1db2f8dba7292c15 rewrite	55c60e5 this pr
ringrage	606.71%	533.30%	630.98%
arabianm	685.52%	574.87%	695.64%
ridingf	635.20%	520.65%	608.64%
gseeker	683.36%	618.98%	743.95%
commandw	630.25%	560.72%	634.03%
hthero93	717.49%	588.87%	710.68%
scfinals	694.80%	587.99%	720.88%
trstar	690.49%	578.01%	706.09%
gunlock	609.94%	553.95%	639.41%
lightbr	668.88%	545.62%	651.80%
kaiserkn	637.06%	558.43%	649.98%
dariusg	724.94%	580.85%	733.24%
bubsymphj	686.23%	533.20%	646.04%
spcinvdj	721.39%	577.61%	729.34%
hthero95	667.58%	559.33%	681.61%
qtheater	692.09%	535.16%	660.61%
elvactr	738.29%	611.27%	736.44%
spcinv95	670.14%	561.78%	655.50%
twinqix	721.19%	581.23%	699.54%
tcobra2	639.72%	544.45%	607.79%
bubblem	616.16%	570.59%	661.85%
cleopatr	593.05%	497.74%	606.97%
arkretrn	599.81%	525.94%	599.10%
kirameki	698.47%	570.16%	673.45%
puchicar	585.04%	511.76%	591.27%
popnpop	598.05%	514.02%	606.06%
landmakr	735.18%	578.79%	700.83%

Windows 10 / CI (Windows) / Intel Core i5-7300U

set	e967a70 pre-rewrite	563b63fabf7a06c6dc94b48a1db2f8dba7292c15 rewrite	55c60e5 this pr
ringrage	289.16%	248.27%	295.81%
arabianm	313.46%	272.34%	333.73%
ridingf	285.56%	226.03%	263.36%
gseeker	299.63%	283.37%	337.42%
commandw	277.50%	249.16%	282.81%
dariusg	322.64%	277.24%	332.32%
bubblem	294.36%	261.99%	302.12%
kirameki	311.88%	274.02%	312.47%
puchicar	251.74%	206.12%	267.41%

per-commit benchmark

-window -nomaximize -sound none -bench 60 commandw of 3 WSL 2.0.9.0 / AMD Ryzen 7 7840HS

commit	description	mean	std.dev.
072367deb59bbd361902e7cb3ddf006cea01d7bf	pre-fredyeye cleanup	502.53%	2.90%
59ae6c160227e2ae7834edf415072a39a911009e	pre-rewrite	537.51%	0.71%
563b63fabf7a06c6dc94b48a1db2f8dba7292c15	f3 video rewrite	466.17%	1.42%
593664483642a5261e9035301602d5112174cfaf	vas cleanup	455.61%	3.51%
f91b896cda8343fc41f069b32b7ef527364bdea1	[rebase point]	467.18%	2.77%
e5e3bd8875d8b6ea87be1d7837d68802632d6d9e	SoA/blend vectorization*	520.14%	3.58%
7dcaecd91d2b4843627d7ac58e776eba97f17c53	AoSoA mistake fix*	519.07%	8.68%
cbc92f34b83514f127122040d6c565dc2360f612	merge sprite framebuffers	526.05%	0.84%
734879ea3808ec67f8b9ecee80c493336e630345	tilemap line usage*	529.54%	2.75%
49e45bdf87f1e0bbc60bb24e90553622f04c0328	mix_line ref params	534.88%	4.01%
7241a37e768d692651328b5739763b3ef9aa8a4e	text line usage*	544.14%	2.16%
7fce16a9741ad53ca5b2da6ba1bf9b7a911cc2f0	fix extend+alt case	535.00%	5.65%
8412c5127291bfac4f78fe3dcf78fe7cd829a6f5	savestate correctness	532.07%	1.80%
53d541de9bf4e9a392e4aa740b2ec24c4e2836e9	strategic uint or layout jostling?	541.07%	3.48%

* validated in -O1 by callgrind cycle counting

i found commandw to be a good test case because it does heavy playfield and sprite scaling work for most scenes in its attract sequence, however, it does have a 6 second completely blank boot. as shown, most sets recover more unthrottled speed than was lost, and the ones that do not still recover most of it.

this system runs slower in general than many other arcade systems in MAME (the test ryzen here gets ~1700% on ibara, 4000% mrdo), but we found that this is not actually due to graphics bottlenecks. skipping all screen_update work, from here, achieved only ~18% (100 unthrottled percentage points) increase, while disabling the ensoniq subdevices resulted in ~180% (+1000% to 1635% unthrottled percentage points) increase (it has to emulate, like 3 or 4 processors in there, with synchronization)

Apr 26 '24 14:04 y-ack

pivot layer regression found in vertical games, marking as draft again.

Apr 30 '24 20:04 y-ack

mame mame copied to clipboard

taito/taito_f3_v.cpp: regain performance after major rewrite

mame
mame copied to clipboard