Explore opportunities for improving performance
Not to be taken as criticism; what has been built in a short amount of time is amazing and we haven't focused on performance yet, so it's more of a future goal.
Currently using SW renderer maxes out CPU at times. While frame rate is decent most of the time, I see it drop to 15-30 fps in certain situations on devices like a Pixel 7 Pro, with battery draining fast.
Let's try to find ways to make the rendering more efficient (both SW/HW)
I mean, if we're rendering on CPU, its expected. Currently the "GPU" backend doesn't do much, can't do textures, lighting, etc. Once the GPU backend works, hopefully its "good enough", however I still think we should try some optimizations just because.
I wouldn't say it's expected. I'm sure there's tons of optimization potential. A modern CPU shouldn't be maxed out while still dropping below 30fps rendering LEGO Island.
I see your point. Did you do a debug or release build? When I did -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS="-g" -DISLE_USE_ASAN -DISLE_USE_UBSAN, the game ran at ~15-30FPS on my Ryzen 7 5800x. On a Release build, no code paths, no ASAN/UBSAN, the game ran at almost my monitors refresh (100Hz), it was ~80FPS (I got fps metrics using the Steam overlay)
Release build of course. Debug performance isn't relevant. I do get decent performance most of the time, but random frame drops occasionally. I'm mostly testing on mobile devices.
I left 3 comments in the code regarding optimizations that are very doable and there should be plenty more possible. Rendering is currently written in the easiest way possible rather than then the fastest, though I did try to be efficient when trivially possible.
But there are also other opportunities like rendering by group and using bounding box for discarding rather then evaluating every triangle. Then we can also cache transformers of object that doesn't move between frames, split up rendering by group type (solid, solid + material, textured, transparent) so that each can have there own optimized path that avoids branching in the pixel loop, this will also allow rendering transparent objects last so they always appear correct visually. That would also allow moving texture and specular information out of the vertex data. And on the GPU side it would allow much of the transformation to happen on the GPU, and is required for working transparency.
Considering that the game currently require SIMD support to allow running the game in software mode it might also be an idea to apply some SIMD optimizations.
There are probably plenty more possible optimizations possible but probably profiling would be needed to figure out what else needs optimizations. Pixel format conversion happens a bit frequently as well which is a fairly big drain, possibly an optimized inline conversion would help. Making the game run in RGB888 would also improve performance. At least having rendering do so.
Mult threading would also be a huge win though it's not trivial to maintain z buffering when doing so.
Release -O3 LTO allows me to run the game at 90fps for the most part but the CPU core is also running between 90-95%
Release build of course. Debug performance isn't relevant. I do get decent performance most of the time, but random frame drops occasionally. I'm mostly testing on mobile devices.
Native or web builds?
When turning around, I notice some frame drop when its loading all the geometry and buildings, even on an M1 Pro
Is there a drop or is the low frame rate just more noticeable when turning?
Definitely more noticeable when turning.
Enable FPS counter and get some real numbers
Both SDL_GPU and OpenGL are now in good shape. Software renderer is a bit better but still has some opportunities for improvements. The big performance killer left is how 2D is done on the CPU and there for requires a round trip from the GPU as well as not being fast to begin with.
Performance is now better then the original in Software renderer, there are still some fairly easy optimizations that can be done in terms of lighting, and multi threading might still be worth exploring.
After https://github.com/isledecomp/isle-portable/pull/348, it should generally make portable perform better then the original on most systems.
After that the only things left is related to how the game it self works and these are best addressed by someone able to test against DX5:
- Only upload textures once and swap them instead of updating a texture during animations
- Don't write to the frame buffer directly for speedometer / gas meter / blue book wiggle, use a surface with the right size (and crop for the meters).
The rest will be low level performance optimizations likely using a profiler and maybe cutting out the dead code from the game.
Strangely surfaces gets rendered in horizontal spans when there are CPU blitting going on, this doesn't seem to help anything and very likely hurts performance while making things more complex.
A lot of work has gone into improving performance already (thanks @AJenbo ) so I think we can close this. There's always more opportunities and other ways to improve things but let's create individual issues for targeted improvements if we see / want them
I'll take a look at some of the CPU based rendering once I get WebGL working correctly.