OpenApoc icon indicating copy to clipboard operation
OpenApoc copied to clipboard

Engine performance in battlescape and ways to improve it

Open Istrebitel opened this issue 7 years ago • 5 comments

Currently profiling shows that an intense battle (36 anthropods vs base defenses) has processor time spread as follows:

  • 50% on redering, of which:

  • 25% (a half) rendering map parts

  • 7% on TileObject.getCenter

  • 24% on battle.update, of which:

  • 10% updating map parts

  • 7% updating units, of which

  • 2% unit AI

  • 2% unit movement

I thought about making several code paths parallel, but it seems that won't improve a lot. How could we improve the most costly part, rendering map parts? It seems mostly spending time on allocation and deallocation if you dig further.

And is it worth paralleling stuff like unit AI and map part update, which takes like about 12% of total load?

I mean, here's what I could do (and it will take a lot of time):

  1. Right now unit AI is called when unit is updated, and gives a decision which is executed immediately. Instead, we could run all AI in parallel, store all decisions then execute them all in parallel.

  2. Right now, map parts try to re-link themselves immediately when they die. Instead, we could add all map parts that need re-linking to a set, and then update them after all map parts are updated. If we do some other changes, like, convert fallen map parts into rubble also in the end, not immediately, this would allow us to execute map part update in parallel.

  3. Several other cases could probably go parallel by adopting same model: instead of writing change immediately, store it in some place where we can write simulatenously, and then gather all the info and process it at once.

Is it worth the time? It looks like no, but I want to confirm. I want OpenApoc to run as good as it can, but at the same time its no use spending time on optimisation that's not worth the effort. Wanted to know your guys' opinion on this.

Istrebitel avatar Sep 15 '17 17:09 Istrebitel

Is the 50% "rendering" actually in the renderer? I remember the iterators for the actual TileObject loop in draw() being a surprisingly large amount of the profile time in my quick msvc tests.

If it is in the renderer - assuming it's using gles30_v2 - which is the fastest and best on a new enough GPU - how much time is spend in flush() compared to the SpriteDrawMachine::draw() calls?

JonnyH avatar Sep 15 '17 18:09 JonnyH

no, render is not mostly the renderer, it's our classes, and when you dig further it's some "allocation" and "deallocation" (I don't know what it means, but that's what you get when you try to go down the tree)

Istrebitel avatar Sep 16 '17 08:09 Istrebitel

and this issue was less of a complaint about renderer performance, and more of a question, should I rework the logic to allow multithreading in places where it seems there's not much load anyway?

Istrebitel avatar Sep 16 '17 09:09 Istrebitel

Multithreading is magic indeed.

Yet its magic is only in the beginning, as it only scales on as much cores you have, which usually is only a 4-8x speedup (which is still nice), but at the expense of hogging the entire CPU :)

Moreover, once you painfully enable a part to be multi-threaded, you'll bear the cost of it forever (shared data, locks, ...).

Change in algorithm such as "pooling" in this case which seems to be "alloc/dealloc" driven, might have much more dramatic improvements.

steveschnepp avatar Sep 18 '17 07:09 steveschnepp

And HUGE 👍 👍 👍 for your work!

steveschnepp avatar Sep 18 '17 07:09 steveschnepp