OpenApoc
OpenApoc copied to clipboard
Engine performance in battlescape and ways to improve it
Currently profiling shows that an intense battle (36 anthropods vs base defenses) has processor time spread as follows:
-
50% on redering, of which:
-
25% (a half) rendering map parts
-
7% on TileObject.getCenter
-
24% on battle.update, of which:
-
10% updating map parts
-
7% updating units, of which
-
2% unit AI
-
2% unit movement
I thought about making several code paths parallel, but it seems that won't improve a lot. How could we improve the most costly part, rendering map parts? It seems mostly spending time on allocation and deallocation if you dig further.
And is it worth paralleling stuff like unit AI and map part update, which takes like about 12% of total load?
I mean, here's what I could do (and it will take a lot of time):
-
Right now unit AI is called when unit is updated, and gives a decision which is executed immediately. Instead, we could run all AI in parallel, store all decisions then execute them all in parallel.
-
Right now, map parts try to re-link themselves immediately when they die. Instead, we could add all map parts that need re-linking to a set, and then update them after all map parts are updated. If we do some other changes, like, convert fallen map parts into rubble also in the end, not immediately, this would allow us to execute map part update in parallel.
-
Several other cases could probably go parallel by adopting same model: instead of writing change immediately, store it in some place where we can write simulatenously, and then gather all the info and process it at once.
Is it worth the time? It looks like no, but I want to confirm. I want OpenApoc to run as good as it can, but at the same time its no use spending time on optimisation that's not worth the effort. Wanted to know your guys' opinion on this.
Is the 50% "rendering" actually in the renderer? I remember the iterators for the actual TileObject loop in draw() being a surprisingly large amount of the profile time in my quick msvc tests.
If it is in the renderer - assuming it's using gles30_v2 - which is the fastest and best on a new enough GPU - how much time is spend in flush() compared to the SpriteDrawMachine::draw() calls?
no, render is not mostly the renderer, it's our classes, and when you dig further it's some "allocation" and "deallocation" (I don't know what it means, but that's what you get when you try to go down the tree)
and this issue was less of a complaint about renderer performance, and more of a question, should I rework the logic to allow multithreading in places where it seems there's not much load anyway?
Multithreading is magic indeed.
Yet its magic is only in the beginning, as it only scales on as much cores you have, which usually is only a 4-8x speedup (which is still nice), but at the expense of hogging the entire CPU :)
Moreover, once you painfully enable a part to be multi-threaded, you'll bear the cost of it forever (shared data, locks, ...).
Change in algorithm such as "pooling" in this case which seems to be "alloc/dealloc" driven, might have much more dramatic improvements.
And HUGE 👍 👍 👍 for your work!