xash3d-fwgs
xash3d-fwgs copied to clipboard
4x~9x VBO generating/uploading performance improvement
While looking into R_GenerateVBO
function, I noticed that it wastes a lot of time checking the same data again and again. After reordering loops by excluding useless surfaces first, level loading times were significantly improved.
I didn't notice any regressions, but FPS was unexpectedly improved as well, though within margin of error. By comparing log on heavy maps like ad_sepulcher.bsp
VBOs got reordered. Not sure what could've caused it, so asking @mittorn for review, since he probably knows his own code better.
All comparisons were made on AMD Ryzen 2600X and NVIDIA GeForce GTX 1070 with engine compiled with default build configuration + bsp2 support.
Attaching loading ad_sepulcher.bsp log before and after the patch:
before
[22:19:16] Note: R_GenerateVBO: allocated array of 65532 verts, texture 16, lm 0
[22:19:16] Note: R_GenerateVBO: allocated array of 65533 verts, texture 30, lm 0
[22:19:16] Note: R_GenerateVBO: allocated array of 65531 verts, texture 39, lm 0
[22:19:16] Note: R_GenerateVBO: allocated array of 65533 verts, texture 76, lm 0
[22:19:16] Note: R_GenerateVBO: allocated array of 65532 verts, texture 6, lm 1
[22:19:16] Note: R_GenerateVBO: allocated array of 65535 verts, texture 17, lm 1
[22:19:16] Note: R_GenerateVBO: allocated array of 65535 verts, texture 36, lm 1
[22:19:16] Note: R_GenerateVBO: allocated array of 65534 verts, texture 58, lm 1
[22:19:17] Note: R_GenerateVBO: allocated array of 65533 verts, texture 160, lm 1
[22:19:17] Note: R_GenerateVBO: allocated array of 65532 verts, texture 13, lm 2
[22:19:17] Note: R_GenerateVBO: allocated array of 65531 verts, texture 30, lm 2
[22:19:17] Note: R_GenerateVBO: allocated array of 65534 verts, texture 94, lm 2
[22:19:17] Note: R_GenerateVBO: allocated array of 26935 verts in 0.821 seconds
[22:19:18] Note: R_GenerateVBO: uploaded VBOs in 0.883 seconds, 1.7 seconds total
after
[22:16:31] Note: R_GenerateVBO: allocated array of 65535 verts, texture 107, lm 0
[22:16:31] Note: R_GenerateVBO: allocated array of 65533 verts, texture 17, lm 0
[22:16:31] Note: R_GenerateVBO: allocated array of 65535 verts, texture 13, lm 0
[22:16:31] Note: R_GenerateVBO: allocated array of 65532 verts, texture 41, lm 0
[22:16:31] Note: R_GenerateVBO: allocated array of 65532 verts, texture 53, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65534 verts, texture 30, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65532 verts, texture 152, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65533 verts, texture 13, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65533 verts, texture 35, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65535 verts, texture 53, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65535 verts, texture 108, lm 2
[22:16:31] Note: R_GenerateVBO: allocated array of 65534 verts, texture 13, lm 2
[22:16:31] Note: R_GenerateVBO: allocated array of 65534 verts, texture 19, lm 2
[22:16:31] Note: R_GenerateVBO: allocated array of 26926 verts in 0.139 seconds
[22:16:31] Note: R_GenerateVBO: uploaded VBOs in 0.0522 seconds, 0.191 seconds total
FPS before and after the patch: 137-140 and 140-145.
On smaller map like disposal.bsp 0.586 and 0.0231 seconds were wasted on VBO, but there was no any frametime difference.
Hazard Course timedemo:
before
[22:48:14] Program args: ./xash3d -timedemo bench -rodir ../roXash -dev 2
[22:48:14] Note: R_GenerateVBO: allocated array of 30188 verts in 0.0145 seconds
[22:48:14] Note: R_GenerateVBO: uploaded VBOs in 0.0177 seconds, 0.0322 seconds total
[22:48:17] Note: R_GenerateVBO: allocated array of 30452 verts in 0.0139 seconds
[22:48:17] Note: R_GenerateVBO: uploaded VBOs in 0.0186 seconds, 0.0325 seconds total
[22:48:19] Note: R_GenerateVBO: allocated array of 22568 verts in 0.00904 seconds
[22:48:19] Note: R_GenerateVBO: uploaded VBOs in 0.0126 seconds, 0.0217 seconds total
[22:48:20] Note: R_GenerateVBO: allocated array of 15892 verts in 0.00366 seconds
[22:48:20] Note: R_GenerateVBO: uploaded VBOs in 0.00612 seconds, 0.00978 seconds total
[22:48:23] Note: R_GenerateVBO: allocated array of 18266 verts in 0.0058 seconds
[22:48:23] Note: R_GenerateVBO: uploaded VBOs in 0.00748 seconds, 0.0133 seconds total
[22:48:24] Note: R_GenerateVBO: allocated array of 10492 verts in 0.00267 seconds
[22:48:24] Note: R_GenerateVBO: uploaded VBOs in 0.00412 seconds, 0.00679 seconds total
[22:48:26] timedemo result: 21066 frames 12.102 seconds 1740.715 fps
after
[22:46:25] Program args: ./xash3d -timedemo bench -rodir ../roXash -dev 2
[22:46:26] Note: R_GenerateVBO: allocated array of 30188 verts in 0.00284 seconds
[22:46:26] Note: R_GenerateVBO: uploaded VBOs in 0.00351 seconds, 0.00635 seconds total
[22:46:28] Note: R_GenerateVBO: allocated array of 30452 verts in 0.00276 seconds
[22:46:28] Note: R_GenerateVBO: uploaded VBOs in 0.00261 seconds, 0.00537 seconds total
[22:46:30] Note: R_GenerateVBO: allocated array of 22568 verts in 0.00182 seconds
[22:46:30] Note: R_GenerateVBO: uploaded VBOs in 0.00227 seconds, 0.00408 seconds total
[22:46:31] Note: R_GenerateVBO: allocated array of 15892 verts in 0.00134 seconds
[22:46:31] Note: R_GenerateVBO: uploaded VBOs in 0.00221 seconds, 0.00356 seconds total
[22:46:34] Note: R_GenerateVBO: allocated array of 18266 verts in 0.00154 seconds
[22:46:34] Note: R_GenerateVBO: uploaded VBOs in 0.00197 seconds, 0.00352 seconds total
[22:46:35] Note: R_GenerateVBO: allocated array of 10492 verts in 0.00112 seconds
[22:46:35] Note: R_GenerateVBO: uploaded VBOs in 0.00178 seconds, 0.00291 seconds total
[22:46:37] timedemo result: 21066 frames 11.678 seconds 1803.883 fps
Though fps difference might be caused by overall better loading times here.